Make this script faster
8 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Dear all,
I have a txt file (eyetracker log) that has 12 columns and 2398068 rows and this code to import it:
The first line is the header with variable names, and only column number 9 is strings, the rest is double
Is there a way to make this script run faster?
Thanks for the insight
filename = 'file.txt' ;
% - Get structure from first line.
fid = fopen( filename, 'r' ) ;
line = fgetl( fid ) ;
fclose( fid ) ;
% - Build formatSpec for TEXTSCAN.
fmt = {'%f%f%f%f%f%f%f%f%s%f%f%f'} ;
% - Read full file.
fid = fopen( filename, 'r' ) ;
data = textscan( fid, fmt, Inf, 'Delimiter', ';' ) ;
fclose( fid ) ;
data = ([data{:}]) ;
data(2:end,9)=num2cell((strcmp(data(2:end,9),'Event 1 > Stimulation')));
data=cellfun(@str2double,data(2:end,[1:8 10:end]),'un',0);
5 Kommentare
jgg
am 17 Dez. 2015
I had a similar issue. I ended up doing the initial data cleaning in Stata or R since it was easier to reformat the columns.
Colin Edgar
am 17 Dez. 2015
I can't make fscanf ignore the first "" string, for example:
frmt = '%*s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
A = fscanf(fid, frmt, [12, inf]);
A = "
Unless I do this:
A = fscanf(fid, '%s', [12, inf]);
A = 12 x 16833 (Char)
What I want is:
A = 12 x 16833 double
Antworten (1)
Colin Edgar
am 17 Dez. 2015
Bearbeitet: Colin Edgar
am 17 Dez. 2015
Here is my solution, takes only ~1sec to run per file (~2MB 12 x 18000). This is for the example data I posted above, but with the initial "timestamp" removed. I believe this answers the OP issue as well, since data was very similar.
formatSpec = '%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n'%
fid = fopen(flnm,'r');
t1 = fgetl(fid); %reads past heading, I know it's a hack but...
t1 = fgetl(fid);
t1 = fgetl(fid);
t1 = fgetl(fid);
mat = fscanf(fid, formatSpec, [12,inf]);
mat = mat'; %transpose to correct layout
fclose(fid);
Versus my old version which took ~15sec (similar to approach of OP)
formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s'
fid = fopen(flnm,'r');
C = textscan(fid,formatSpec,'HeaderLines',4,'Delimiter',',');
mat = cell2mat(cellfun(@str2double,C,'UniformOutput',false));
fclose(fid);
0 Kommentare
Siehe auch
Kategorien
Mehr zu Workspace Variables and MAT Files finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!