MATLAB Answers

How would I create a script to read files line-by-line to save memory

6 views (last 30 days)
EL
EL on 20 Aug 2019
Commented: Adam Danz on 21 Aug 2019
Hey guys,
I've done the MatLab Onramp, but I still feel extremely confused about what the hell I'm doing and it's frustrating me. I don't even know how to google the right qeustions, and interpreting pages from this website is a task that alone is like learning another language. Learning German was easier than this it feels like. So I'm sorry if I'm asking stupid questions, but I feel like I've been thrown into the deep end.
I have a .txt file that is 1,000,000,000 lines long, give or take a few 100,000,000 (no two files are the same length)
It constists of only numbers, no headers that I'm aware of.
Because of the file size, I cannot load the whole file. It needs to be read in portions. I'd rather not split the file or
I'm looking to gather variance data every 100,000 data points, to be organized in a single column/multiple row format.
Idealy, I'd also like to have new columns generated every 360 variance data points, however this isn't as important as generating the varience data first.
Thanks for the help!
  6 Comments
Adam Danz
Adam Danz on 21 Aug 2019
The methods proposed by myself and Walter involve reading in chunks of data rather than reading in line-by-line (as you're doing with fgets). I suggest you abandon that method and use textscan() instead.

Sign in to comment.

Accepted Answer

Adam Danz
Adam Danz on 21 Aug 2019
Edited: Adam Danz on 21 Aug 2019
Here's a demo that shows how to read in multiple lines of a file in chunks. I included lots of comments that explain what's going on. There's a section at the bottom where you can perform whatever operations you want on the data that is being read it. Walter's answer includes the variance calculations you described.
% Set parameters
file = 'x0.txt'; % The file you're reading; it's better to use a full path such as "C:\Users\name\Documents\x0.txt'
nrows = 5; %number of rows to read in at a time (you can change this to 100000 or whatever)
% Initialize the file for reading
fid = fopen(file);
% Set some loop variables
ignore = 0; %number of rows to ignore at the beginning (headers etc)
done = false; % flag that detects when file is complete
% Loop through until you've read all lines of file. When that
% happens, "done" will be switched to true and the while-loop
% will end.
while ~done
% Read the next 'nrows'; C will be a cell array of strings.
C = textscan(fid,'%s', nrows, 'delimiter', '\n', 'headerlines', ignore);
% If C is completely empty, you've finished the file.
if cellfun(@isempty, C)
% C has no data so the file is finished.
% Set the "done" flag to True so the while-loop ends
done = true;
% Skip the rest of this iteration.
continue
end
% Convert C from a cell array of strings to a numeric vector
% This assumes the content of the strings are numbers.
nVec = str2double(C{:});
% Increment the number of lines to ignore
ignore = ignore + nrows;
% % % % % % % % % % % % % % % % % % %
% %
% HERE IS WHERE YOU'LL DO WHATEVER %
% OPERATIONS YOU NEED TO DO WITH %
% THE VALUES YOU JUST READ IN. %
% %
% % % % % % % % % % % % % % % % % % %
end
% Close file
fclose(fid);
  2 Comments
Adam Danz
Adam Danz on 21 Aug 2019
Nice catch, Walter. I originally copied a similar code that uses fgetl() and adapted it to this but I guess I overlooked the frewind. I edited and fixed it. Thanks.

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 20 Aug 2019
vary_every = 10000;
expected_buffers = 10000; %1000000000 / 100000
group_every = 360;
variances = zeros(1, expected_buffers);
filename = 'YourFileNameHere';
[fid, msg] = fopen(filename, 'r');
if fid < 0
error('Failed to open file "%s" because "%s"', filename, msg)
end
buffcount = 0
while true
this_buffer = cell2mat( textscan(fid, '%f', vary_every) );
if isempty(this_buffer); break; end %end of file
buffcount = buffcount + 1;
variances(buffcount) = variance(this_buffer);
end
variances(buffcount+1:expected_buffers) = []; %trim off any extra
leftover = mod(buffcount,group_every);
if leftover ~= 0
variances(end+1:end+group_every-leftover) = nan;
end
variances = reshape(variances, group_every, []);
disp(variances)

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by