Filter löschen
Filter löschen

readmatrix returns NaNs instead of numeric values for nearly indistinguishable .txt file

54 Ansichten (letzte 30 Tage)
Input Files
The attached file1.txt and file2.txt have identical structure (9 lines of header followed by data arranged in 15 columns). The header looks like this:
ITEM: TIMESTEP
881000
ITEM: NUMBER OF ATOMS
37
ITEM: BOX BOUNDS pp pp pp
-9.6850194863609573e-01 1.0509150710611115e+02
-8.0199580669506787e-01 8.7024035559953262e+01
-2.0781435615505643e+02 2.0781435615505643e+02
ITEM: ATOMS mass id type x y z c_1 c_2 f_eco[1] f_eco[2] c_sv[1] c_sv[2] c_sv[3] c_sv[4] backforth
Expected Behavior
I want to extract the numerical data in the header contained on lines 6-8 (3 rows and 2 columns). I use readmatrix to do this as follows (for file1.txt.):
simcell = readmatrix('file1.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
-0.9646 105.0876 -0.7987 87.0208 -207.7989 207.7989
It also works to extract the data that appears after the header as follows:
data_raw = readmatrix('file1.txt','FileType','text','NumHeaderLines',9); % output hidden for brevity
I have hundreds of thousands of files like this, and this approach works for almost all of them, but occasionally it fails...
Unexpected Behavior
When I do the same thing for file2.txt, it returns NaNs and I can't figure out why:
simcell = readmatrix('file2.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×1
NaN NaN NaN
In an effort to debug the issue I looked at hidden characters, delimiters, character encoding and all appear identical between the two input files. However, I did find that if I manually delete all of the data after the header (attached as file2short.txt) I get the correct result:
simcell = readmatrix('file2short.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
-0.9685 105.0915 -0.8020 87.0240 -207.8144 207.8144
Question
I know there are many other ways one could accomplish the desired result, but that is not my question. My question is: why does this unexpected behavior occur in this example?
  3 Kommentare
Kangming Xu 10/181
Kangming Xu 10/181 am 10 Aug. 2022
Bearbeitet: Kangming Xu 10/181 am 10 Aug. 2022
Hi Oliver,
Thank you for reaching out.
I successfully reproduced the issue in MATLAB R2022a and reported the issue to our development team. I will let you know once I have an update. Let me know if you have any questions in the meantime!
Oliver Johnson
Oliver Johnson am 18 Aug. 2022
@Kangming, thank you for the explanation. If you submit it as an answer, I am happy to accept it.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Kangming Xu 10/181
Kangming Xu 10/181 am 11 Aug. 2022
Verschoben: Walter Roberson am 19 Aug. 2022
Here is the update.
The difference between the two files is that the delimiter is default detected as {','} for file2.txt and detected as {'\t' ' '} for file1.txt. The reason for it is that the provided "Range" values is limited, so rows outside of the Range are used to determine the format of a file to ensure the best result. As there are more rows of the numeric space-delimited rows of data in file1, the delimiter is selected as {'\t' ' '}.
As for why the function works properly without "Delimiter" property for file2short.txt , the detection heuristics would depend strongly on the selected data if there are only a few rows.
If the format of files and range of selected data is same, you could capture the detection options in a "DelimitedTextImportOptions" object. Please refer to the link below for more information.
Eg:
opts = delimitedTextImportOptions('Delimiter', ' ', 'DataLines', [6 8], 'NumVariables', 2, 'VariableTypes', {'double', 'double'})
data1 = readmatrix('file1.txt', opts)
data2 = readmatrix('file2.txt', opts)

Weitere Antworten (0)

Tags

Produkte


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by