How would I create a script to read files line-by-line to save memory

Question

EL am 20 Aug. 2019

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/476933-how-would-i-create-a-script-to-read-files-line-by-line-to-save-memory

Kommentiert: Adam Danz am 21 Aug. 2019

Hey guys,

I've done the MatLab Onramp, but I still feel extremely confused about what the hell I'm doing and it's frustrating me. I don't even know how to google the right qeustions, and interpreting pages from this website is a task that alone is like learning another language. Learning German was easier than this it feels like. So I'm sorry if I'm asking stupid questions, but I feel like I've been thrown into the deep end.

I have a .txt file that is 1,000,000,000 lines long, give or take a few 100,000,000 (no two files are the same length)

It constists of only numbers, no headers that I'm aware of.

Because of the file size, I cannot load the whole file. It needs to be read in portions. I'd rather not split the file or

I'm looking to gather variance data every 100,000 data points, to be organized in a single column/multiple row format.

Idealy, I'd also like to have new columns generated every 360 variance data points, however this isn't as important as generating the varience data first.

Thanks for the help!

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Adam Danz am 20 Aug. 2019

Bearbeitet: Adam Danz am 20 Aug. 2019

@ Eric, that level of frustration is normal at this stage! You're asking the right questions so I'm sure you're going to succeed.

"I don't even know how to google the right questions"

Reduce your question to key words and add "matlab" to the font of your search. Nine times out of ten you'll end up in this forum or within the matlab documentation. Sometimes it will lead you to other resources but they usually aren't has helpful.

Matlab change plot symbols
Matlab how to delete something on the plot
etc..

"...and interpreting pages from this website is a task that alone is like learning another language"

Yes, it is like that but you'll get the hang of it. I'd estimate that there are less than 50 critical terms to undestand to be able to quickly read through the documentation. Just keep at it.

"Learning German was easier than this it feels like"

Nein! German has cases. Matlab has switch-cases which are much easier to understand.

EL am 20 Aug. 2019

Bearbeitet: Adam Danz am 21 Aug. 2019

In MATLAB Online öffnen

x0.txt

I cut off a little section. This is the very top of a file I would use.

EDIT: Here's a script I'm currently using, and the errors I recieved

%% Loading Files for Input
% Currently, this can only do a single file at a time. Future editions intend to
% have multiple files loaded at once to save time. 
prompt = 'Enter the name of the .txt file to run (e.g. Organism_L/D_Media_Temp_mmddyyyy_Signal.txt).';
inputfile = input(prompt, 's');
%% Data Collection Rate
prompt = 'Enter the Data Collection rate(Hz). [20,000]';
Hz=input(prompt);
if isempty(Hz)
    Hz=20000;
end
%% Variance (n)
% This designated the amount of data to use for each datapoint generated.
% The standard amount is 5 seconds (100,000 datapoints). If left empty, 
% this is the value that will be used. Otherwise, this will be done in
% seconds. 
% Variables
%       vt = variance time. The time in seconds is the input, which is then
%       multiplied by 20,000. 
prompt = 'Enter the time length for variance calc in sec (20,000 points/sec) [5 seconds].';
vt=input(prompt);
if isempty(vt)
    vt=5;
end
%% Designating file for export
% This is the name of the .txt file that will contain the variance data
prompt = 'Enter the name for the output file (e.g. Organism_L/D_Media_Temp_mmddyyy_VarianceTime).';
outputfile=input(prompt,'s');
%% Initianting the code
% This is intended to be read line-by-line, then generating a single column
% text file of the variance data. 
infile=fopen(inputfile);
outfile=fopen(outputfile);
fline=fgetl(infile);
line_index=1;
variancewindow = Hz*vt;
data=zeros(1,variancewindow);
while ischar(dline);
    data(line_index) = str2double(dline)  ;  % str2double = Convert string to double precision value. What does that mean......?
    line_index=line_index+1;
    if line_index > variancewindow;
        line_index=1;
        variance_value=variance_function(data);
        fprintf(outfile,'%f\n',variance_value);
        data=zeros(1,variancewindow);
    end
    dline=fgetl(infile);
end
fclose(infile);
data=data(data~=0);
variance_value=variance_function(data);
fprintf(outfile,'%f/n',variance_value);
fclose(outfile);s

EDIT 2: The error's

Error using fgets
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in fgetl (line 32)
[tline,lt] = fgets(fid);
Error in NMDIII_Data (line 59)
fline=fgetl(infile);

Just to be clear, this is something I was workign on while asking this question. That's why I didn't post it in the original question.

Adam Danz am 21 Aug. 2019

The methods proposed by myself and Walter involve reading in chunks of data rather than reading in line-by-line (as you're doing with fgets). I suggest you abandon that method and use textscan() instead.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Adam Danz am 21 Aug. 2019

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/476933-how-would-i-create-a-script-to-read-files-line-by-line-to-save-memory#answer_388415

Bearbeitet: Adam Danz am 21 Aug. 2019

In MATLAB Online öffnen

Here's a demo that shows how to read in multiple lines of a file in chunks. I included lots of comments that explain what's going on. There's a section at the bottom where you can perform whatever operations you want on the data that is being read it. Walter's answer includes the variance calculations you described.

% Set parameters
file = 'x0.txt';  % The file you're reading; it's better to use a full path such as "C:\Users\name\Documents\x0.txt'
nrows = 5; %number of rows to read in at a time (you can change this to 100000 or whatever)
% Initialize the file for reading 
fid = fopen(file); 
% Set some loop variables
ignore = 0; %number of rows to ignore at the beginning (headers etc)
done = false; % flag that detects when file is complete
% Loop through until you've read all lines of file.  When that 
% happens, "done" will be switched to true and the while-loop
% will end.
while ~done
    % Read the next 'nrows'; C will be a cell array of strings.  
    C = textscan(fid,'%s', nrows, 'delimiter', '\n', 'headerlines', ignore);
    % If C is completely empty, you've finished the file.  
    if cellfun(@isempty, C)
        % C has no data so the file is finished. 
        % Set the "done" flag to True so the while-loop ends
        done = true; 
        % Skip the rest of this iteration.
        continue
    end
    % Convert C from a cell array of strings to a numeric vector
    % This assumes the content of the strings are numbers.
    nVec = str2double(C{:}); 
    % Increment the number of lines to ignore
    ignore = ignore + nrows; 
    
    % % % % % % % % % % % % % % % % % % %
    %                                   %
    % HERE IS WHERE YOU'LL DO WHATEVER  %
    % OPERATIONS YOU NEED TO DO WITH    %
    % THE VALUES YOU JUST READ IN.      %
    %                                   %
    % % % % % % % % % % % % % % % % % % %
    
    
end
% Close file 
fclose(fid); 

2 Kommentare
Keine anzeigenKeine ausblenden

Walter Roberson am 21 Aug. 2019

I do not see a purpose on the frewind() ? textscan() will continue from the current file position.

Adam Danz am 21 Aug. 2019

Nice catch, Walter. I originally copied a similar code that uses fgetl() and adapted it to this but I guess I overlooked the frewind. I edited and fixed it. Thanks.

Melden Sie sich an, um zu kommentieren.

Answer 2

Walter Roberson am 20 Aug. 2019

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/476933-how-would-i-create-a-script-to-read-files-line-by-line-to-save-memory#answer_388413

In MATLAB Online öffnen

vary_every = 10000;
expected_buffers = 10000;   %1000000000 / 100000
group_every = 360;
variances = zeros(1, expected_buffers);
filename = 'YourFileNameHere';
[fid, msg] = fopen(filename, 'r');
if fid < 0
    error('Failed to open file "%s" because "%s"', filename, msg)
end
buffcount = 0
while true
    this_buffer = cell2mat( textscan(fid, '%f', vary_every) );
    if isempty(this_buffer); break; end   %end of file
    buffcount = buffcount + 1;
    variances(buffcount) = variance(this_buffer);
end
variances(buffcount+1:expected_buffers) = [];    %trim off any extra
leftover = mod(buffcount,group_every);
if leftover ~= 0
    variances(end+1:end+group_every-leftover) = nan;
end
variances = reshape(variances, group_every, []);
disp(variances)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

How would I create a script to read files line-by-line to save memory

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

How would I create a script to read files line-by-line to save memory

6 Kommentare 4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden