Filter löschen
Filter löschen

Make this script faster

2 Ansichten (letzte 30 Tage)
samy rima
samy rima am 9 Dez. 2015
Bearbeitet: Colin Edgar am 17 Dez. 2015
Dear all,
I have a txt file (eyetracker log) that has 12 columns and 2398068 rows and this code to import it:
The first line is the header with variable names, and only column number 9 is strings, the rest is double
Is there a way to make this script run faster?
Thanks for the insight
filename = 'file.txt' ;
% - Get structure from first line.
fid = fopen( filename, 'r' ) ;
line = fgetl( fid ) ;
fclose( fid ) ;
% - Build formatSpec for TEXTSCAN.
fmt = {'%f%f%f%f%f%f%f%f%s%f%f%f'} ;
% - Read full file.
fid = fopen( filename, 'r' ) ;
data = textscan( fid, fmt, Inf, 'Delimiter', ';' ) ;
fclose( fid ) ;
data = ([data{:}]) ;
data(2:end,9)=num2cell((strcmp(data(2:end,9),'Event 1 > Stimulation')));
data=cellfun(@str2double,data(2:end,[1:8 10:end]),'un',0);
  5 Kommentare
jgg
jgg am 17 Dez. 2015
I had a similar issue. I ended up doing the initial data cleaning in Stata or R since it was easier to reformat the columns.
Colin Edgar
Colin Edgar am 17 Dez. 2015
I can't make fscanf ignore the first "" string, for example:
frmt = '%*s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
A = fscanf(fid, frmt, [12, inf]);
A = "
Unless I do this:
A = fscanf(fid, '%s', [12, inf]);
A = 12 x 16833 (Char)
What I want is:
A = 12 x 16833 double

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Colin Edgar
Colin Edgar am 17 Dez. 2015
Bearbeitet: Colin Edgar am 17 Dez. 2015
Here is my solution, takes only ~1sec to run per file (~2MB 12 x 18000). This is for the example data I posted above, but with the initial "timestamp" removed. I believe this answers the OP issue as well, since data was very similar.
formatSpec = '%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n'%
fid = fopen(flnm,'r');
t1 = fgetl(fid); %reads past heading, I know it's a hack but...
t1 = fgetl(fid);
t1 = fgetl(fid);
t1 = fgetl(fid);
mat = fscanf(fid, formatSpec, [12,inf]);
mat = mat'; %transpose to correct layout
fclose(fid);
Versus my old version which took ~15sec (similar to approach of OP)
formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s'
fid = fopen(flnm,'r');
C = textscan(fid,formatSpec,'HeaderLines',4,'Delimiter',',');
mat = cell2mat(cellfun(@str2double,C,'UniformOutput',false));
fclose(fid);

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by