Control How MATLAB Imports Your Data
Import your data into MATLAB® using an importing function, such as readtable
. You can let MATLAB determine how to import your data, or you can specify parameters to control how MATLAB imports your data.
Importing functions detect how best to import the contents of your input file by looking at the format of your data based on delimiters, number of header lines, number of variables, types of variables, and metadata. The accuracy of format detection increases with the size of your data set as patterns are more easily recognizable with more data. Text files benefit the most from this analysis because they often have the least structure.
There are several ways to control how MATLAB imports your data:
The simplest option is to specify name-value arguments with the importing function. For example, you can specify the
MissingRule
name-value argument to specify how to address gaps in your data. The Import Tool also provides this degree of control, for more information see Import Tool.Alternatively, you can specify an import options object with the importing function. This option is useful if you want to import multiple files with similar formats, customize at the variable level, or avoid the automatic format detection.
The examples in this topic use readtable
to import data as tables. For more information about importing data into other formats, such as matrices or cell arrays, see Import Text Files and Import Spreadsheets.
Import Data from File into MATLAB
You can import data from a file using an importing function, such as readtable
. For example, the sample file outages_flawed.csv
contains column-oriented data:
In this sample file, the second and fourth rows of data contain gaps in data, while the third row of data contains a date value that cannot be converted during import. Due to the small size of this data set, only six rows total, any anomaly significantly impacts format detection.
Create a table from outages_flawed.csv
using the importing function readtable
. The importing function recognizes the gaps in data as well as the date value that cannot be converted and performs the default action of filling with appropriate missing values.
T = readtable("outages_flawed.csv")
T=5×6 table
Region OutageTime Loss Customers RestorationTime Cause
_____________ ________________ _______________ _______________ ________________ ___________________
{'SouthWest'} 2002-02-01 12:18 {'458,9772218'} {'1820159,482'} 2002-02-07 16:50 {'winter storm' }
{'SouthEast'} 2003-01-23 00:49 {'530,1399497'} {'212035,3001'} NaT {'winter storm' }
{'SouthEast'} NaT {'289,4035493'} {'142938,6282'} 2003-02-17 08:14 {'winter storm' }
{'West' } 2004-04-06 05:44 {'434,8053524'} {'340371,0338'} NaT {'equipment fault'}
{'MidWest' } 2002-03-16 06:18 {'186,4367788'} {'212754,055' } 2002-03-18 23:23 {'severe storm' }
Control How MATLAB Detects and Imports Your Data
To control how the importing function analyzes your file, you can use name-value arguments. These arguments give you increased control over your import with table-level customization. See the documentation for importing functions for a list of their name-value arguments. This degree of control is appropriate for the majority of workflows.
For example, customize the import of the sample file at the table level by using name-value arguments:
Set the delimiter to
"|"
.Set the decimal separator to
","
.Omit rows with data that causes import errors.
Omit rows with missing data.
T = readtable("outages_flawed.csv",Delimiter="|",DecimalSeparator=",", ... ImportErrorRule="omitrow",MissingRule="omitvar")
T=4×5 table
Region OutageTime Loss Customers Cause
_____________ ________________ ______ __________ ___________________
{'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 {'winter storm' }
{'SouthEast'} 2003-01-23 00:49 530.14 2.1204e+05 {'winter storm' }
{'West' } 2004-04-06 05:44 434.81 3.4037e+05 {'equipment fault'}
{'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 {'severe storm' }
Import Data from Multiple Files and Control Individual Variables
If you have multiple files to import and your files have a similar format, you can reuse the same import configuration for all your files by creating an import options object. You can set up the import options object based on the format of one of the files by using the detectImportOptions
function and then use that object to specify how to import all your files.
Passing name-value arguments to detectImportOptions
customizes the import options object at the table level by helping the function detect the format of your data. You can further customize the import options object at the variable level by using the getvaropts
, setvartype
, and setvaropts
functions.
For example, create an import options object with table-level customizations:
Set the delimiter to
"|"
.Set the decimal separator to
","
.Omit rows with data that cause import errors.
Omit rows with missing data.
opts = detectImportOptions("outages_flawed.csv",Delimiter="|", ... DecimalSeparator=",",ImportErrorRule="omitrow",MissingRule="omitrow");
Now fine-tune the import options object with variable-level customizations by using the getvaropts
, setvartype
, and setvaropts
functions.
View the current options for the variables named Region
, OutageTime
, and Loss
.
getvaropts(opts,["Region","OutageTime","Loss"])
ans = 1x3 heterogeneous VariableImportOptions (TextVariableImportOptions, DatetimeVariableImportOptions, NumericVariableImportOptions) array with properties: Name Type FillValue TreatAsMissing QuoteRule Prefixes Suffixes EmptyFieldRule
Specify variable-level customizations:
For the
Region
,OutageTime
, andLoss
variables, change their data types tostring
,datetime
, andsingle
, respectively.For the
Cause
variable, trim leading white space from the text.For the
Customers
variable, treat fields containing0
orNA
as missing.
opts = setvartype(opts,["Region","OutageTime","Loss"], ... ["string","datetime","single"]); opts = setvaropts(opts,"Cause",WhitespaceRule="trimleading"); opts = setvaropts(opts,"Customers",TreatAsMissing=["0","NA"]);
Import outages_flawed.csv
and the similarly formatted file outages_flawed2.csv
with your custom options.
T = readtable("outages_flawed.csv",opts); T2 = readtable("outages_flawed2.csv",opts);
Import Data from Multiple Files with No Format Detection
Instead of using an input file to create an import options object, you can create the import options object without an input file. Based on your input file type, choose a function to create a specific type of import options object: delimitedTextImportOptions
, fixedWidthImportOptions
, xmlImportOptions
, htmlImportOptions
, or wordDocumentImportOptions
. You can customize the import options object at the table level using name-value arguments during creation or by adjusting its properties after creation. For variable-level customizations, use the getvaropts
, setvartype
, and setvaropts
functions.
For example, import the two sample files which have a similar format and structure. Create a delimitedTextImportOptions
object for data with six variables, set the delimiter to "|"
, and omit rows that error during import.
opts = delimitedTextImportOptions(NumVariables=6, ... Delimiter="|",ImportErrorRule="omitrow");
Continue to customize your import options object at the table level by adjusting its properties:
Omit rows with missing data.
Specify that the data lines start on line 2.
Specify that the variable names are on line 1.
Preserve any variable names that include spaces and non-ASCII characters.
opts.MissingRule = "omitrow"; opts.DataLines = 2; opts.VariableNamesLine = 1; opts.VariableNamingRule = "preserve";
Further customize your import options at the variable level using the getvaropts
, setvartype
, and setvaropts
functions:
Set the data types of the six variables.
Set the decimal separator to
","
for the third and fourth variables.
opts = setvartype(opts,1:6, ... ["string" "datetime" "single" "single" "datetime" "categorical"]); opts = setvaropts(opts,3:4,DecimalSeparator=",");
Import outages_flawed.csv
and the similarly formatted file outages_flawed2.csv
with your custom options.
T = readtable("outages_flawed.csv",opts); T2 = readtable("outages_flawed2.csv",opts);
See Also
detectImportOptions
| getvaropts
| setvaropts
| setvartype
| readtable
| SpreadsheetImportOptions
| DelimitedTextImportOptions
| readmatrix
| readcell
| readvars