Matlab unable to parse a Numeric field when I use the gather function on a tall array.

151 Ansichten (letzte 30 Tage)

Ninad am 21 Aug. 2025 um 13:13

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array

Kommentiert: Harald vor etwa 6 Stunden

So I have a CSV file with a large amount of datapoints that I want to perform a particular algorithm on. So I created a tall array from the file and wanted to import a small chunk of the data at a time. However, when I tried to use gather to get the small chunk into the memory, I get the following error.

"Board_Ai0" is the header of the CSV file. It is not in present in row 15355 as can be seen below where I opened the csv file in MATLAB's import tool.

The same algorithm works perfectly fine when I don't use tall array but instead import the whole file into the memory. However, I have other larger CSV files that I also want to analyze but won't fit in memory.

UPDATE: So apparently the images were illegible but someone else edited the question to make the size of the image larger so I guess it should be fine now. Also I can't attach the data files to this question because the data files that give me this problems are all larger than 5 GB.

11 Kommentare
9 ältere Kommentare anzeigen9 ältere Kommentare ausblenden

Harald am 1 Sep. 2025 um 12:35

In MATLAB Online öffnen

This makes sense: as it stands, MATLAB tries to import the entire file at once - I should have mentioned that.

You need to set 'ReadMode' to 'partialfile' and specify a 'ReadFcn' that imports a certain number, say 100,000, of rows at a time. It could then look like this:

ds = fileDatastore("yourfile.csv", "ReadFcn", @readdata, "UniformRead", true, "ReadMode","partialfile");
data = readall(ds);
function [data,startrow,done] = readdata(filename,startrow)
nRows = 100000;
if isempty(startrow)
    startrow = 2;
end
opts = detectImportOptions(filename);
opts.DataLines = [startrow, startrow+nRows-1];
data = readtimetable(filename, opts);
data = rmmissing(data);
done = height(data) < nRows;
startrow = startrow + nRows;
end

Best wishes,

Harald

Ninad am 6 Sep. 2025 um 12:47

In MATLAB Online öffnen

So the code works well when I run it on a file that can fit in memory. But when I run it on a file that cannot, I get the following error:

The code is:

function [data,startrow,done] = readdata(filename,startrow)
    nRows = 10000000;
    if isempty(startrow)
        startrow = 2;
    end
    opts = detectImportOptions(filename);
    opts.DataLines = [startrow, startrow+nRows-1];
    data = readtimetable(filename, opts);
    data = rmmissing(data);
    done = height(data) < nRows;
    startrow = startrow + nRows;
end
function [data,startrow,done]=givetimetable(~,~)
    data=timetable(seconds(200.000005),[0.139389038085938],'VariableNames',["Board0_Ai0"]);
    startrow=2;
    done=true;
end
ds = fileDatastore("1kcross.csv", "ReadFcn", @readdata, "UniformRead", true,"PreviewFcn",@givetimetable,"ReadMode","partialfile");
data=tall(ds);
slice=data(1:10000000,:);
slice=gather(slice);

What am I still doing wrong?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Akzeptierte Antwort

Stephen23 vor etwa 19 Stunden

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array#answer_1570149

Bearbeitet: dpb vor etwa 21 Stunden

Providing the RANGE argument does not prevent READTABLE from calling its automatic format detection:

https://www.mathworks.com/help/matlab/import_export/control-how-matlab-imports-your-data.html

which might involve loading all or a significant part of the file into memory. The documented solution is to provide an import options object yourself (e.g. you can generate this on a known good file of a smaller size and then storing it) or alternatively using a low-level file reading command, e.g. FSCANF, FREAD, etc.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

dpb vor etwa 16 Stunden

Good point, @Stephen23, I had figured it surely was smart enough to only read a small amount, but maybe not. I has started to suggest the standalone import options object, but thought surely it would still function without; certainly not with memory on the scan forensics. I suppose even if that did succeed, it could have it still in memory and not have more.

Guess one can try cutting down the range to see if can make it work on smaller chunks but I suspect Mathworks didn't really think about tables as tall data all that much and it probably will need to revert to low-level i/o. I was hoping against hope to hold of on that route...

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

dpb am 6 Sep. 2025 um 13:59

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array#answer_1570115

Bearbeitet: dpb am 6 Sep. 2025 um 18:43

In MATLAB Online öffnen

It appears it is detectImportOptions that is having the problem -- apparently it tries to read the whole file into memory first before it does its forensics.

I don't think you need an import options object anyway, use the 'Range' named parameter in the argument to readtimetable

Something like

function [data,startrow,done] = readdata(filename,startrow)
    nRows = 10000000;
    if isempty(startrow)
        startrow = 2;  % this looks unlikely to be right from the earlier image there are 3(?) header rows?
    end
    range=sprintf('%d:%d',startrow, startrow+nRows);    % build row range expression
    data = readtimetable(filename, 'Range',range);
    data = rmmissing(data);
    done = height(data) < nRows;
    startrow = startrow + nRows;
end

This may still have some issues using the timetable, however if it first reads variable names from a header line which header line isn't there in the subsequent sections of the file. I don't know what trouble you'll run into with such large files if try to read 100K lines into the file but tell it to also read the variablenames from the second or third line in the file....probably ignoring variable names and letting MATLAB use defaults then set the Properties.VariableNames after reading of just accept the defaults would be best bet.

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Stephen23 vor etwa eine Stunde

It would be interesting to print out the Range value when it errors.

Harald vor etwa 2 Stunden

@Ninad, sorry that my suggestion did not work and for the troubles around this. I would usually test my suggestions but this is difficult due to not having the data.

@dpb, while I work at MathWorks, I am not a developer or in Technical Support. I try to support Answers as my core duties permit.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Kategorien

MATLAB Data Import and Analysis Data Import and Export Standard File Formats Text Files

Mehr zu Text Files finden Sie in Help Center und File Exchange

Produkte

MATLAB

Version

R2025a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Matlab unable to parse a Numeric field when I use the gather function on a tall array.

11 Kommentare
9 ältere Kommentare anzeigen9 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Matlab unable to parse a Numeric field when I use the gather function on a tall array.

11 Kommentare 9 ältere Kommentare anzeigen9 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

11 Kommentare
9 ältere Kommentare anzeigen9 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden