readmatrix returns NaNs instead of numeric values for nearly indistinguishable .txt file

Question

Oliver Johnson am 4 Aug. 2022

1
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1774285-readmatrix-returns-nans-instead-of-numeric-values-for-nearly-indistinguishable-txt-file

Verschoben: Walter Roberson am 19 Aug. 2022

Input Files

The attached file1.txt and file2.txt have identical structure (9 lines of header followed by data arranged in 15 columns). The header looks like this:

ITEM: TIMESTEP
881000
ITEM: NUMBER OF ATOMS
37
ITEM: BOX BOUNDS pp pp pp
-9.6850194863609573e-01 1.0509150710611115e+02
-8.0199580669506787e-01 8.7024035559953262e+01
-2.0781435615505643e+02 2.0781435615505643e+02
ITEM: ATOMS mass id type x y z c_1 c_2 f_eco[1] f_eco[2] c_sv[1] c_sv[2] c_sv[3] c_sv[4] backforth 

Expected Behavior

I want to extract the numerical data in the header contained on lines 6-8 (3 rows and 2 columns). I use readmatrix to do this as follows (for file1.txt.):

simcell = readmatrix('file1.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
   -0.9646  105.0876
   -0.7987   87.0208
 -207.7989  207.7989

It also works to extract the data that appears after the header as follows:

data_raw = readmatrix('file1.txt','FileType','text','NumHeaderLines',9); % output hidden for brevity

I have hundreds of thousands of files like this, and this approach works for almost all of them, but occasionally it fails...

Unexpected Behavior

When I do the same thing for file2.txt, it returns NaNs and I can't figure out why:

simcell = readmatrix('file2.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×1
   NaN
   NaN
   NaN

In an effort to debug the issue I looked at hidden characters, delimiters, character encoding and all appear identical between the two input files. However, I did find that if I manually delete all of the data after the header (attached as file2short.txt) I get the correct result:

simcell = readmatrix('file2short.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
   -0.9685  105.0915
   -0.8020   87.0240
 -207.8144  207.8144

Question

I know there are many other ways one could accomplish the desired result, but that is not my question. My question is: why does this unexpected behavior occur in this example?

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Kangming Xu 10/181 am 10 Aug. 2022

Bearbeitet: Kangming Xu 10/181 am 10 Aug. 2022

Hi Oliver,

Thank you for reaching out.

I successfully reproduced the issue in MATLAB R2022a and reported the issue to our development team. I will let you know once I have an update. Let me know if you have any questions in the meantime!

Oliver Johnson am 18 Aug. 2022

@Kangming, thank you for the explanation. If you submit it as an answer, I am happy to accept it.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Kangming Xu 10/181 am 11 Aug. 2022

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1774285-readmatrix-returns-nans-instead-of-numeric-values-for-nearly-indistinguishable-txt-file#answer_1029025

Verschoben: Walter Roberson am 19 Aug. 2022

In MATLAB Online öffnen

Here is the update.

The difference between the two files is that the delimiter is default detected as {','} for file2.txt and detected as {'\t' ' '} for file1.txt. The reason for it is that the provided "Range" values is limited, so rows outside of the Range are used to determine the format of a file to ensure the best result. As there are more rows of the numeric space-delimited rows of data in file1, the delimiter is selected as {'\t' ' '}.

As for why the function works properly without "Delimiter" property for file2short.txt , the detection heuristics would depend strongly on the selected data if there are only a few rows.

If the format of files and range of selected data is same, you could capture the detection options in a "DelimitedTextImportOptions" object. Please refer to the link below for more information.

https://www.mathworks.com/help/matlab/ref/matlab.io.text.delimitedtextimportoptions.html

Eg:

opts = delimitedTextImportOptions('Delimiter', ' ', 'DataLines', [6 8], 'NumVariables', 2, 'VariableTypes', {'double', 'double'})
data1 = readmatrix('file1.txt', opts)
data2 = readmatrix('file2.txt', opts)