Main Content

xmlImportOptions

Import options object for XML files

Since R2021a

Description

An XMLImportOptions object enables you to specify how MATLAB® imports structured, tabular data from XML files. The object contains properties that control the data import process, including the handling of errors and missing data.

Creation

You can create an XMLImportOptions object using either the xmlImportOptions function (described here) or the detectImportOptions function:

  • Use xmlImportOptions to define the import properties based on your import requirements.

  • Use detectImportOptions to detect and populate the import properties based on the contents of the XML file specified in filename.

    opts = detectImportOptions(filename)

Description

opts = xmlImportOptions creates an XMLImportOptions object with one variable.

opts = xmlImportOptions('NumVariables',numVars) creates the object with the number of variables specified in numVars.

example

opts = xmlImportOptions(___,Name,Value) specifies additional properties for an XMLImportOptions object using one or more name-value arguments.

Input Arguments

expand all

Number of variables, specified as a positive scalar integer.

Properties

expand all

Variable Properties

Variable names, specified as a cell array of character vectors or string array. The VariableNames property contains the names to use when importing variables.

If the data contains N variables, but no variable names are specified, then the VariableNames property contains {'Var1','Var2',...,'VarN'}.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of VariableNamingRule to 'preserve'.

Example: opts.VariableNames returns the current (detected) variable names.

Example: opts.VariableNames(3) = {'Height'} changes the name of the third variable to Height.

Data Types: char | string | cell

Flag to preserve variable names, specified as either "modify" or "preserve".

  • "modify" — Convert invalid variable names (as determined by the isvarname function) to valid MATLAB identifiers.

  • "preserve" — Preserve variable names that are not valid MATLAB identifiers such as variable names that include spaces and non-ASCII characters.

Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname function). To preserve these variable names and row names, set the value of VariableNamingRule to "preserve". Variable names are not refreshed when the value of VariableNamingRule is changed from "modify" to "preserve".

Data Types: char | string

Data type of variable, specified as a cell array of character vectors, or string array containing a set of valid data type names. The VariableTypes property designates the data types to use when importing variables.

To update the VariableTypes property, use the setvartype function.

Example: opts.VariableTypes returns the current variable data types.

Example: opts = setvartype(opts,'Height',{'double'}) changes the data type of the variable Height to double.

Subset of variables to import, specified as a character vector, string scalar, cell array of character vectors, string array or an array of numeric indices.

SelectedVariableNames must be a subset of names contained in the VariableNames property. By default, SelectedVariableNames contains all the variable names from the VariableNames property, which means that all variables are imported.

Use the SelectedVariableNames property to import only the variables of interest. Specify a subset of variables using the SelectedVariableNames property and use readtable to import only that subset.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of VariableNamingRule to 'preserve'.

Example: opts.SelectedVariableNames = {'Height','LastName'} selects only two variables, Height and LastName, for the import operation.

Example: opts.SelectedVariableNames = [1 5] selects only two variables, the first variable and the fifth variable, for the import operation.

Example: T = readtable(filename,opts) returns a table containing only the variables specified in the SelectedVariableNames property of the opts object.

Data Types: uint16 | uint32 | uint64 | char | string | cell

Type specific variable import options, returned as an array of variable import options objects. The array contains an object corresponding to each variable specified in the VariableNames property. Each object in the array contains properties that support the importing of data with a specific data type.

Variable options support these data types: numeric, text, logical, datetime, or categorical.

To query the current (or detected) options for a variable, use the getvaropts function.

To set and customize options for a variable, use the setvaropts function.

Example: opts.VariableOptions returns a collection of VariableImportOptions objects, one corresponding to each variable in the data.

Example: getvaropts(opts,'Height') returns the VariableImportOptions object for the Height variable.

Example: opts = setvaropts(opts,'Height','FillValue',0) sets the FillValue property for the variable Height to 0.

Variable descriptions XPath expression, specified as a character vector or string scalar that the reading function reads uses to select the table variable descriptions. You must specify VariableDescriptionsSelector as a valid XPath version 1.0 expression.

Example: 'VariableDescriptionsSelector','/RootNode/ChildNode'

Table variable XPath expressions, specified as a cell array of character vectors or string array that the reading function uses to select table variables. You must specify VariableSelectors as valid XPath version 1.0 expressions.

Example: 'VariableSelectors',{'/RootNode/ChildNode'}

Example: 'VariableSelectors',"/RootNode/ChildNode"

Example: 'VariableSelectors',["/RootNode/ChildNode1","/RootNode/ChildNode2"]

Variable units XPath, specified as a character vector or string scalar that the reading function uses to select the table variable units. You must specify VariableUnitsSelector as a valid XPath version 1.0 expression.

Example: 'VariableUnitsSelector','/RootNode/ChildNode'

Table Properties

Table row names XPath expression, specified as a character vector or string scalar that the reading function uses to select the names of the table rows. You must specify RowNamesSelector as a valid XPath version 1.0 expression.

Example: 'RowNamesSelector','/RootNode/ChildNode'

Table row XPath expression, specified as a character vector or string scalar that the reading function uses to select individual rows of the output table. You must specify RowSelector as a valid XPath version 1.0 expression.

Example: 'RowSelector','/RootNode/ChildNode'

Table data XPath expression, specified as a character vector or string scalar that the reading function uses to select the output table data. You must specify TableSelector as a valid XPath version 1.0 expression.

Example: 'TableSelector','/RootNode/ChildNode'

Set of registered XML namespace prefixes, specified as the comma-separated pair consisting of RegisteredNamespaces and an array of prefixes. The reading function uses these prefixes when evaluating XPath expressions on an XML file. Specify the namespace prefixes and their associated URLs as an Nx2 string array. RegisteredNamespaces can be used when you also evaluate an XPath expression specified by a selector name-value argument, such as StructSelector for readstruct, or VariableSelectors for readtable and readtimetable.

By default, the reading function automatically detects namespace prefixes to register for use in XPath evaluation, but you can also register new namespace prefixes using the RegisteredNamespaces name-value argument. You might register a new namespace prefix when an XML node has a namespace URL, but no declared namespace prefix in the XML file.

For example, evaluate an XPath expression on an XML file called example.xml that does not contain a namespace prefix. Specify 'RegisteredNamespaces' as ["myprefix", "https://www.mathworks.com"] to assign the prefix myprefix to the URL https://www.mathworks.com.

T = readtable("example.xml", "VariableSelector", "/myprefix:Data",...
 "RegisteredNamespaces", ["myprefix", "https://www.mathworks.com"])

Example: 'RegisteredNamespaces',["myprefix", "https://www.mathworks.com"]

Replacement Rules

Procedure to manage missing data, specified as one of the values in this table.

Missing RuleBehavior
'fill'

Replace missing data with the contents of the FillValue property.

The FillValue property is specified in the VariableImportOptions object of the variable being imported. For more information on accessing the FillValue property, see getvaropts.

'error'Stop importing and display an error message showing the missing record and field.
'omitrow'Omit rows that contain missing data.
'omitvar'Omit variables that contain missing data.

Example: opts.MissingRule = 'omitrow';

Data Types: char | string

Procedure to handle import errors, specified as one of the values in this table.

Import Error RuleBehavior
'fill'

Replace the data where the error occurred with the contents of the FillValue property.

The FillValue property is specified in the VariableImportOptions object of the variable being imported. For more information on accessing the FillValue property, see getvaropts.

'error'Stop importing and display an error message showing the error-causing record and field.
'omitrow'Omit rows where errors occur.
'omitvar'Omit variables where errors occur.

Example: opts.ImportErrorRule = 'omitvar';

Data Types: char | string

Procedure to handle repeated XML nodes in a given row of a table, specified as 'addcol', 'ignore', or 'error'.

Repeated Node Rule

Behavior

'addcol'

Add columns for the repeated nodes under the variable header in the table. Specifying the value of 'RepeatedNodeRule' as 'addcol' does not create a separate variable in the table for the repeated node.

'ignore'

Skip importing the repeated nodes.

'error'Display an error message and abort the import operation.

Example: 'RepeatedNodeRule','ignore'

Examples

collapse all

Create XML import options for an XML file, specify the variables to import, and then read the data.

The XML file students.xml has four sibling nodes named Student, which each contain the same child nodes and attributes.

type students.xml
<?xml version="1.0" encoding="utf-8"?>
<Students>
    <Student ID="S11305">
        <Name FirstName="Priya" LastName="Thompson" />
        <Age>18</Age>
        <Year>Freshman</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">591 Spring Lane</Street>
            <City>Natick</City>
            <State>MA</State>
      </Address>
      <Major>Computer Science</Major>
      <Minor>English Literature</Minor>
   </Student>
   <Student ID="S23451">
        <Name FirstName="Conor" LastName="Cole" />
        <Age>18</Age>
        <Year>Freshman</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">4641 Pearl Street</Street>
            <City>San Francisco</City>
            <State>CA</State>
        </Address>
        <Major>Microbiology</Major>
        <Minor>Public Health</Minor>
    </Student>
    <Student ID="S119323">
        <Name FirstName="Morgan" LastName="Yang" />
        <Age>21</Age>
        <Year>Senior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">30 Highland Road</Street>
            <City>Detriot</City>
            <State>MI</State>
        </Address>
        <Major>Political Science</Major>
   </Student>
   <Student ID="S201351">
        <Name FirstName="Salim" LastName="Copeland" />
        <Age>19</Age>
        <Year>Sophomore</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">3388 Moore Avenue</Street>
            <City>Fort Worth</City>
            <State>TX</State>
        </Address>
        <Major>Business</Major>
        <Minor>Japanese Language</Minor>
   </Student>
   <Student ID="S201351">
        <Name FirstName="Salim" LastName="Copeland" />
        <Age>20</Age>
        <Year>Sophomore</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">3388 Moore Avenue</Street>
            <City>Fort Worth</City>
            <State>TX</State>
        </Address>
        <Major>Business</Major>
        <Minor>Japanese Language</Minor>
    </Student>
    <Student ID="54600">
        <Name FirstName="Dania" LastName="Burt" />
        <Age>22</Age>
        <Year>Senior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">22 Angie Drive</Street>
            <City>Los Angeles</City>
            <State>CA</State>
        </Address>
        <Major>Mechanical Engineering</Major>
        <Minor>Architecture</Minor>
   </Student>
    <Student ID="453197">
        <Name FirstName="Rikki" LastName="Gunn" />
        <Age>21</Age>
        <Year>Junior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">65 Decatur Lane</Street>
            <City>Trenton</City>
            <State>ME</State>
        </Address>
        <Major>Economics</Major>
        <Minor>Art History</Minor>
   </Student>
</Students>

Create an XMLImportOptions object from a file. Specify the value of VariableSelectors as //@FirstName to select the FirstName element node to import as a table variable.

opts = xmlImportOptions("VariableSelectors","//@FirstName");

Use readtable along with the options object to import the specified variable.

T = readtable("students.xml",opts)
T=7×1 table
       Var1   
    __________

    {'Priya' }
    {'Conor' }
    {'Morgan'}
    {'Salim' }
    {'Salim' }
    {'Dania' }
    {'Rikki' }

Import the contents of an XML file into a table.

The students.xml file has seven sibling nodes named Student, which each contain the same child nodes and attributes.

type students.xml
<?xml version="1.0" encoding="utf-8"?>
<Students>
    <Student ID="S11305">
        <Name FirstName="Priya" LastName="Thompson" />
        <Age>18</Age>
        <Year>Freshman</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">591 Spring Lane</Street>
            <City>Natick</City>
            <State>MA</State>
      </Address>
      <Major>Computer Science</Major>
      <Minor>English Literature</Minor>
   </Student>
   <Student ID="S23451">
        <Name FirstName="Conor" LastName="Cole" />
        <Age>18</Age>
        <Year>Freshman</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">4641 Pearl Street</Street>
            <City>San Francisco</City>
            <State>CA</State>
        </Address>
        <Major>Microbiology</Major>
        <Minor>Public Health</Minor>
    </Student>
    <Student ID="S119323">
        <Name FirstName="Morgan" LastName="Yang" />
        <Age>21</Age>
        <Year>Senior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">30 Highland Road</Street>
            <City>Detriot</City>
            <State>MI</State>
        </Address>
        <Major>Political Science</Major>
   </Student>
   <Student ID="S201351">
        <Name FirstName="Salim" LastName="Copeland" />
        <Age>19</Age>
        <Year>Sophomore</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">3388 Moore Avenue</Street>
            <City>Fort Worth</City>
            <State>TX</State>
        </Address>
        <Major>Business</Major>
        <Minor>Japanese Language</Minor>
   </Student>
   <Student ID="S201351">
        <Name FirstName="Salim" LastName="Copeland" />
        <Age>20</Age>
        <Year>Sophomore</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">3388 Moore Avenue</Street>
            <City>Fort Worth</City>
            <State>TX</State>
        </Address>
        <Major>Business</Major>
        <Minor>Japanese Language</Minor>
    </Student>
    <Student ID="54600">
        <Name FirstName="Dania" LastName="Burt" />
        <Age>22</Age>
        <Year>Senior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">22 Angie Drive</Street>
            <City>Los Angeles</City>
            <State>CA</State>
        </Address>
        <Major>Mechanical Engineering</Major>
        <Minor>Architecture</Minor>
   </Student>
    <Student ID="453197">
        <Name FirstName="Rikki" LastName="Gunn" />
        <Age>21</Age>
        <Year>Junior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">65 Decatur Lane</Street>
            <City>Trenton</City>
            <State>ME</State>
        </Address>
        <Major>Economics</Major>
        <Minor>Art History</Minor>
   </Student>
</Students>

First, create an XMLImportOptions object by using detectImportOptions to detect aspects of your XML file. Read just the street names into a table by specifying the VariableSelectors name-value argument as the XPath expression of the Street element node. Register a custom namespace prefix to the existing namespace URL by setting the RegisteredNamespaces name-value argument.

opts = detectImportOptions("students.xml",RegisteredNamespaces=["myPrefix","https://www.mathworks.com"], ...
    VariableSelectors="//myPrefix:Street");

Then, import the specified variable using readtable with the import options object.

T = readtable("students.xml",opts)
T=7×1 table
          Street       
    ___________________

    "591 Spring Lane"  
    "4641 Pearl Street"
    "30 Highland Road" 
    "3388 Moore Avenue"
    "3388 Moore Avenue"
    "22 Angie Drive"   
    "65 Decatur Lane"  

Tips

  • Use XPath selectors to specify which elements of the XML input document to import. For example, suppose you want to import the XML file myFile.xml, which has the following structure:

    <data>
        <table category="ones">
            <var>1</var>
            <var>2</var>
        </table>
        <table category="tens">
            <var>10</var>
            <var>20</var>
        </table>
    </data>
    
    This table provides the XPath syntaxes that are supported for XPath selector name-value arguments, such as VariableSelectors or TableSelector.

    Selection OperationSyntaxExampleResult
    Select every node whose name matches the node you want to select, regardless of its location in the document.Prefix the name with two forward slashes (//).
    data = readtable('myFile.xml', 'VariableSelectors', '//var')
    data =
    
      4×1 table
    
        var
        ___
    
         1 
         2 
        10 
        20 
    Read the value of an attribute belonging to an element node.Prefix the attribute with an at sign (@).
    data = readtable('myFile.xml', 'VariableSelectors', '//table/@category')
    data =
    
      2×1 table
    
        categoryAttribute
        _________________
    
             "ones"      
             "tens"   
    Select a specific node in a set of nodes.Provide the index of the node you want to select in square brackets ([]).
    data = readtable('myFile.xml', 'TableSelector', '//table[1]')
    data =
    
      2×1 table
    
        var
        ___
    
         1 
         2 
    
    Specify precedence of operations.Add parentheses around the expression you want to evaluate first.
    data = readtable('myFile.xml', 'VariableSelectors', '//table/var[1]')
    data =
    
      2×1 table
    
        var
        ___
    
         1 
        10 
    data = readtable('myFile.xml', 'VariableSelectors', '(//table/var)[1]')
    data =
    
      table
    
        var
        ___
    
         1 

Version History

Introduced in R2021a