How to Import/parse sparse data from a text file into MATLAB?

I've been having issues with parsing data into MATLAB from a text file. The textfile has discontinuities between its strings (it has spaces), and it seems like everytime I tried to import the data into matlab it just combines everything and messes up the data. I would like to basically read the text file (attached) and import the corresponging strings with their values into an array (probably).
I also tried to import the file into Excel and see if I could delimiter my data in a nicer format so I can easily import it into MATLAB but excel also does not like the data format and it breaks every word into a column which messes up everything as well.
Any help would be greatly appreciate it.
Here is what I did so far:
%% Setup the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 3);
% Specify range and delimiter
opts.DataLines = [2, Inf];
opts.Delimiter = ",";
% Specify column names and types
opts.VariableNames = ["TITLE", "BEGININPUTDATAECHO", "VarName3"];
opts.VariableTypes = ["string", "string", "string"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, ["TITLE", "BEGININPUTDATAECHO", "VarName3"], "WhitespaceRule", "preserve");
opts = setvaropts(opts, ["TITLE", "BEGININPUTDATAECHO", "VarName3"], "EmptyFieldRule", "auto");
% Import the data
ATR42500zjf2 = readtable("test.txt", opts);
%% Clear temporary variables
clear opts

 Akzeptierte Antwort

Mohammad Sami
Mohammad Sami am 20 Sep. 2020
After some testing, it seems your file can be read by setting delimiter to 3x spaces.
You can combine this with the rest of your options here as needed.
opts = detectImportOptions('test.txt','Delimiter',' ','ConsecutiveDelimitersRule','join','ExpectedNumVariables',3,'LeadingDelimitersRule','ignore');
opts = setvartype(opts,'VALUEDIMENSIONS','char');
a = readtable('test.txt',opts);

8 Kommentare

Hi Mohammad,
Thank you for your help. I tried what you suggested but all I get is a table instead of a matrix where I can access the numeric values. The table shows everything as 'strings' including the numeric values.. do you know why? Also, the table does not show the first 8 rows of data...
Thanks.
Bonie10
Bonie10 am 21 Sep. 2020
Bearbeitet: Bonie10 am 21 Sep. 2020
I tried to use table2array but the last array column combines the numeric values (VALUES) with the strings (DIMENSIONS). Do you know how to separate those two columns?
Here is the output, where is missing the first 8 rows of data and also combines the last two columns...
It may have detected the options incorrectly. We can just overwrite the DataLines
However since you wish to separate the dimensions from the numerice values, I changed the delimiters to 2x space instead of 3x space. This would mean any data separated by more then one space will be split into different columns.
You can replace readtable with readcell function if you want a cell array.
opts = detectImportOptions('test.txt','Delimiter',' ','ConsecutiveDelimitersRule','join','ExpectedNumVariables',4,'LeadingDelimitersRule','ignore');
opts.DataLines = [2 Inf]; % force reading from 2nd line.
a = readtable('test.txt',opts);
Nice, thanks for that!
Why does the first row "DESCRIPTION NAME VALUE DIMENSIONS" does not appear anymore? Instead I see var1 var2 var3 var4 as the titles...
Because your file is not delimited in a standard way like comma or tabs, auto detection can only go so far.
I suggest you specify the options manually so that you dont have to rely on the detection.
opts.VariableNames = {'DESCRIPTION' 'NAME' 'VALUE' 'DIMENSIONS'};
Bonie10
Bonie10 am 21 Sep. 2020
Bearbeitet: Bonie10 am 21 Sep. 2020
Thank you.
What if the DESCRIPTION column has some leading spaces? For example:
# NAMELIST $KDSD
ASDCSD, ASCDDF, HFSDCA AND CASDEC SASDW
DESCRIPTION NAME VALUE DIMENSIONS
ASDASD ASDS SFAS 0.1231 LBS
DSAS DASD ERAD 0.6432 M
QWEDS SDFS SFAS 0.1231 LBS
As you can see the second row has a leading space in the DESCRIPTION. I tried that with the code and it messes up that row... it shiftes everything one colume to the right... Also, the space between DESCRIPTION and NAME is only one which alse messes up the 2x spaces in the code... Do you know how that can be fixed in your code? I have attached the new file as reference (named test2.txt). Notice that this file starts with two different rows that I wouldn't want to include in a "structure", do you know how to NOT include those and only 4 columns data into a structure?
I appreciate your help.
I think the only other option left is to used fixed width option. This assumes that each column has fixed character width.
opts = fixedWidthImportOptions('NumVariables',4,'VariableWidths',[30 12 6 12]);
opts.VariableNames = {'DESCRIPTION' 'NAME' 'VALUE' 'DIMENSIONS'};
a = readtable('test.txt',opts);
b = readtable('test2.txt',opts);
Bonie10
Bonie10 am 22 Sep. 2020
Bearbeitet: Bonie10 am 22 Sep. 2020
That helps, thank you so much!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

S=fileread('test.txt') ;
data = str2double(regexp(S, '-?\d+(\.\d*)?, 'match' ) ;
This code allows for negative data even though the sample file does not have any.
This code allows for integers with no decimal point. The file has one of those.
This code does not allow for numbers less than 1 that do not have a leading 0 - 0.1 is fine but not .1 without the 0
This code does not allow exponential notation.

4 Kommentare

Bonie10
Bonie10 am 21 Sep. 2020
Bearbeitet: Bonie10 am 21 Sep. 2020
Thank you for the help.
Is there a way you can associate the "NAME" column with their corresponding 'VALUE"? Instead of getting just one big matrix with numbers without knowing which are their corresponding variable names...
Like having an array/cell with the data description on each colum and their corresponding values on another column. See above "output" picture as an example.
Thanks.
filename = 'test.txt';
opt = detectImportOptions(filename, 'FileType', 'fixedwidth');
opt.ExtraColumnsRule = 'ignore';
opt.SelectedVariableNames = {'Var2', 'Var3'};
opt.DataLines = [3 inf];
data = readtable(filename, opt);
data = rmmissing(data);
Now data.Var1 will contain the Name field and data.Var2 will contain the corresponding numeric value.
Bonie10
Bonie10 am 21 Sep. 2020
Bearbeitet: Bonie10 am 21 Sep. 2020
When running this code, the table is missing data and also the last two columns "VALUE" and "DIMENSIONS" are combined (they should be separate)... do you know why?
Bonie10
Bonie10 am 21 Sep. 2020
Bearbeitet: Bonie10 am 21 Sep. 2020
I have attached another file (test2.txt) that reflects some format changes that is more relevant to the text file I am dealing with.
Notice that this file starts with two different rows that I wouldn't want to include in my structure, do you know how to NOT include those and only parse the NAME/VALUE columns data?
Thanks!

Melden Sie sich an, um zu kommentieren.

Produkte

Version

R2019b

Gefragt:

am 18 Sep. 2020

Bearbeitet:

am 22 Sep. 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by