How to read every nth line with textscan

Question

Kyle am 4 Jul. 2013

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/81115-how-to-read-every-nth-line-with-textscan

Kommentiert: lukacs kuslits am 17 Nov. 2016

I have a very large data file (.dat), and I cannot read it all with textscan. I get 'Out of memory' error.

It is not necessary for me to import all of the data into MATLAB, I would like to read every nth line to get a sample of the data, and reduce the memory required.

I have found some solutions already posted, but none are working in my case. I think because I am using a '.dat' file, and the data is imported as strings (the reason is the decimal used in the data is a comma, so I must read it as a string, then convert comma to decimal, then change to double).

Here is what I am trying:

MATLAB code
  d = fopen('TempMessung.dat');
  while ~feof(d)
      B = textscan(d, '%s %s %s', 1, 'headerlines', 5);
        for i=1:2
            fgets(d);
        end
    end

So this code should skip 5 headerlines, read 1 line from the file into B, then go into the for loop and skip 2 lines, then return to textscan to read one line into B.

Running this code, I get the results that B is a 1x3 cell. B{1} contains just the last row data point from the 1st column B{2} contains just the last row data point from the 2nd column B{3} contains just the last row data point from the 3rd column

Here is a few sample rows from my data:

 pre-formatted
MTS793|
Datenerfassung            Zeit:  2002,1694  Sec  01.03.2012 10:46:40
Zeit  25 KN-R Zähler  In_12_Temp
Sec  cycles  deg_C
0,40966797  0  22,226631
0,60986328  0  22,125919
0,81005859  0  22,260201
1,0102539  0  22,176275
1,2104492  0  22,226631
1,4106445  0  22,276985
1,6108398  0,5  22,134312
1,8110352  0,5  22,260201
2,0112305  0,5  22,209845
2,2114258  0,5  22,125919
2,4116211  0,5  22,251808
2,6118164  0,5  22,134312
2,8120117  1  22,19306
3,012207  1  22,226631
3,2124023  1  22,083956
3,4125977  1  22,235023
3,612793  1  22,184668
3,8129883  1  22,125919
4,0131836  1  22,243416
4,2133789  1,5  22,117527
4,4135742  1,5  22,19306
4,6137695  1,5  22,226631
4,8139648  1,5  22,092348
5,0141602  1,5  22,235023
5,2143555  1,5  22,184668
5,4145508  2  22,134312
5,6147461  2  22,226631
5,8149414  2  22,109135
6,0151367  2  22,201452
6,215332  2  22,226631
6,4155273  2  22,092348

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Guru am 4 Jul. 2013

3
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/81115-how-to-read-every-nth-line-with-textscan#answer_90850

Bearbeitet: Guru am 4 Jul. 2013

In MATLAB Online öffnen

Honestly what you are trying to do is something that doesn't need textscan to read it in. The benefit that textscan provides is the ability to quickly read in the whole file or chunks of the file at once where the formatting of the data remains the same. Low level file I/O routines would probably work better for your case, which textscan does use within itself - as any other file I/O does ultimately.

Try the following code:

m = 1;
d = fopen('TempMessung.dat');
% Skip 5 headerlines
for n=1:5
  tline = fgetl(d);
end
while ischar(tline)
  tline = fgetl(d);
  A(m).data = tline;
  m = m+1;
  % Skip next two lines
  for n=1:2
    tline = fgetl(d);
  end
end
fclose(d);

After you read it all into A, you can do basic string manipulation

% Preallocate B to a cell array
B = cell(length(A),3);
for n = 1:length(A)
    % Replace the , with .
    B(n,:) = strsplit(strrep(A(n).data,',','.'));
end

This will give you the same results as textscan that you wanted with the change of the comma to decimals. If you want this into a double array:

B = cellfun(@str2double,B);

MATLAB is so simple. HTH

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

lukacs kuslits am 17 Nov. 2016

I have tried what you suggested and I got the exception: "Conversion to cell from char is not possible." This is also the case if I try textscan.

Melden Sie sich an, um zu kommentieren.

Answer 2

Matt J am 4 Jul. 2013

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/81115-how-to-read-every-nth-line-with-textscan#answer_90831

Bearbeitet: Matt J am 4 Jul. 2013

In MATLAB Online öffnen

You can fix it as follows

 ii=0;
...
 B(ii,1:3) = textscan(d, '%s %s %s', 1, 'headerlines', 5);
 ii=ii+1;

but if it is a large file, it will run very slowly.

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Kyle am 4 Jul. 2013

I cannot pre-allocate because what I am doing is going through about 10 folders, each folder has this file, and they are all different sizes. Just one file is too large, causing the out of memory error.

So how I will handle this large file will be the same way I handle the smaller files that I can store in RAM. I am not sure if I can hold 1/3, I was just using this as an example.

I'd like to avoid making changes to the individual data file, because I have over 100 files that I will eventually loop through.

Is there a faster way to split the textscan call into n segments (I think 10 should be sufficient), that will save memory and not be as slow as this method above (I believe this method is calling textscan several thousand times). The data file in question is 336MB.

Matt J am 4 Jul. 2013

Bearbeitet: Matt J am 4 Jul. 2013

In MATLAB Online öffnen

I've lost track of what the issue is. You can call textscan chunkwise by doing

 C(ithChunk) = textscan(fid, 'format', N);
    ithChunk=ithChunk+1;

where N is the number of lines per chunk. Textscan will resume from the point you left off with every subsequent call, until you close the file.

Melden Sie sich an, um zu kommentieren.

How to read every nth line with textscan

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

How to read every nth line with textscan

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden