Make this script faster

Question

samy rima am 9 Dez. 2015

1
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/259541-make-this-script-faster

Bearbeitet: Colin Edgar am 17 Dez. 2015

In MATLAB Online öffnen

Dear all,

I have a txt file (eyetracker log) that has 12 columns and 2398068 rows and this code to import it:

The first line is the header with variable names, and only column number 9 is strings, the rest is double

Is there a way to make this script run faster?

Thanks for the insight

 filename = 'file.txt' ;
 % - Get structure from first line.
 fid  = fopen( filename, 'r' ) ;
 line = fgetl( fid ) ;
 fclose( fid ) ;
 % - Build formatSpec for TEXTSCAN.
 fmt = {'%f%f%f%f%f%f%f%f%s%f%f%f'} ;
 % - Read full file.
 fid  = fopen( filename, 'r' ) ;
 data = textscan( fid, fmt, Inf, 'Delimiter', ';' ) ;
 fclose( fid ) ;
 data = ([data{:}]) ;
 data(2:end,9)=num2cell((strcmp(data(2:end,9),'Event 1 > Stimulation')));
 data=cellfun(@str2double,data(2:end,[1:8 10:end]),'un',0);

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

jgg am 17 Dez. 2015

I had a similar issue. I ended up doing the initial data cleaning in Stata or R since it was easier to reformat the columns.

Colin Edgar am 17 Dez. 2015

In MATLAB Online öffnen

I can't make fscanf ignore the first "" string, for example:

frmt =  '%*s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
A = fscanf(fid, frmt, [12, inf]);
A = "
Unless I do this:
A = fscanf(fid, '%s', [12, inf]);
A = 12 x 16833 (Char)
What I want is:
A = 12 x 16833 double

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Colin Edgar am 17 Dez. 2015

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/259541-make-this-script-faster#answer_203620

Bearbeitet: Colin Edgar am 17 Dez. 2015

In MATLAB Online öffnen

Here is my solution, takes only ~1sec to run per file (~2MB 12 x 18000). This is for the example data I posted above, but with the initial "timestamp" removed. I believe this answers the OP issue as well, since data was very similar.

formatSpec = '%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n'%
 fid = fopen(flnm,'r');
    t1 = fgetl(fid);  %reads past heading, I know it's a hack but...
    t1 = fgetl(fid);
    t1 = fgetl(fid);
    t1 = fgetl(fid);
  mat = fscanf(fid, formatSpec, [12,inf]);
  mat = mat';  %transpose to correct layout
 fclose(fid);

Versus my old version which took ~15sec (similar to approach of OP)

formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s'
 fid = fopen(flnm,'r');
  C = textscan(fid,formatSpec,'HeaderLines',4,'Delimiter',',');
 mat = cell2mat(cellfun(@str2double,C,'UniformOutput',false));
fclose(fid);