Speeding Up Text File Reading

I'm attempting to read large .txt files (e.g., 500 MB). The trick is that I'm trying to downsample them at the same time (to make the data more manageable). I notice that the textscan(); really slows my overall code. Is there a faster alternative?
In C, we'd use something like sscanf(); which is lightning fast.
% Read and Downsample %
in = fopen('sampreport.txt', 'r');
fprintf('\nReading sample report at %i hz downsample...', desiredHz);
tline = fgetl(in); %eat header
fileline = 0;
currsamp = 0;
i = 0;
tline = fgetl(in);
while ischar(tline);
if isempty(strfind(tline, '. .'))
temp = textscan(tline, '%*f %*f %*s %*f %*f %*f %*s %*f %*s %*s %*f %*s %*f %*f %*f %*s %*f %*f %*f %f %f');
currtime = temp{1}-temp{2};
if mod(currtime,2) == 1
currtime = currtime + 1;
end
if currtime <= maxtime && ~mod(currtime,(1000/desiredHz))
i = i + 1;
temp = textscan(tline,'%f %f %*s %f %f %f %*s %f %*s %*s %f %s %*f %*f %*f %*s %f %f %*f %*f %*f');
sample.RIGHT_GAZE_X(i) = temp{1};
sample.RIGHT_GAZE_Y(i) = temp{2};
sample.block(i) = temp{3};
sample.trialNum(i) = temp{4};
sample.subjNum(i) = temp{5};
sample.targLoc(i) = temp{6};
sample.singLoc(i) = temp{7};
sample.singPres(i) = temp{8};
sample.ACC(i) = temp{9};
sample.RT(i) = temp{10};
sample.sampTime(i) = currtime;
sample.currentSamp(i) = currtime*(desiredHz/1000);
end
end
fileline = fileline + 1; % countline
if mod(fileline,200000)<1
fprintf('.');
end
tline = fgetl(in);
end
fclose(in);

1 Kommentar

Stephen23
Stephen23 am 14 Jul. 2015
Bearbeitet: Stephen23 am 14 Jul. 2015
This code is very inefficient. MATLAB is not C (or any other language), and there are different concepts that should be applied to use it efficiently.
In particular reading this file line-by-line is a complete waste of textscan's ability to read the whole file (or parts of it) at once. Also expanding the output cell arrays on every iteration is going to be slow, without some form of array preallocation. These are not difficult problems to solve...
Unfortunately you do not provide any sample data for us to work with, so advising you on how to improve your code is difficult. If you actually want help we need to have something to work with, otherwise how can we test and check if our suggestion are an improvement? Please upload a sample file (yes, it can be redacted to make it smaller) using the paperclip button, and then pressing both the Choose file and Attach file buttons.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Large Files and Big Data finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 13 Jul. 2015

Beantwortet:

am 15 Jul. 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by