Reading a set of numeric values from 100s of .txt files inside a folder
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
Wander11
am 2 Aug. 2022
Bearbeitet: dpb
am 3 Aug. 2022
I have a folder named SimResults. Inside the folder I have 100s of .txt files. Let the name of file i is of the format "val1_x(i)_val2_y(i)_val3_z(i).txt" . The variables x, y and z varies across different file names. Inside the file i, I have the below text somewhere:
Frame 98 Finished!
Layer 1: DL n_bits = 823200. DL BER = 1.09e-05
Frame 99 Finished!
Layer 1: DL n_bits = 831600. DL BER = 1.08e-05
Frame 100 Finished!
Layer 1: DL n_bits = 840000. DL BER = 1.07e-05
I want to extract data from the line after "Frame 100 Finished! " in every txt file. So in effect, for this text file i, I should obtain a set of values as below
val1(i) = x(i)
val2(i) =y(i)
val3(i) =z(i)
DL_n_bits(i) =840000
DL BER(i)=1.07e-05
Can someone help me sequentially do this for all the txt files and save that data?
2 Kommentare
Akzeptierte Antwort
Walter Roberson
am 3 Aug. 2022
foldername = 'SimResults';
dinfo = dir( fullfile(foldername, '*.txt'));
filenames = {dinfo.name};
nfiles = length(filenames);
val1 = zeros(nfiles,1);
val2 = zeros(nfiles,1);
val3 = zeros(nfiles,1);
DL_n_bits = zeros(nfiles,1);
DL_BER = zeros(nfiles,1);
for K = 1 : nfiles
thisfilename = filenames{K};
parts = regexp(thisfilename, '_', 'split');
x = str2double(parts{2})
y = str2double(parts{4});
z = str2double(parts{6});
S = fileread( fullfile(foldername, thisfilename) );
info = regexp(S, 'Frame 100 Finished!.*?DL n_bits = (?<bits>\d+.*BER = (?<BER>\S+)', 'once', 'names');
bits = str2double(info.bits);
BER = str2double(info.BER);
val1(K) = x;
val2(K) = y;
val3(K) = z;
DL_n_bits(K) = bits;
DL_BER(K) = BER;
end
4 Kommentare
Walter Roberson
am 3 Aug. 2022
This code does presume that the bits is integer and the period after is for human reading
Weitere Antworten (1)
dpb
am 3 Aug. 2022
Bearbeitet: dpb
am 3 Aug. 2022
Alternatively, just as an experiment, wonder how it would work using some of the more recently introduced features --
foldername = 'SimResults';
d=dir( fullfile(foldername, '*.txt'));
filenames = {dinfo.name};
nfiles = length(filenames);
% here, since we've got the full list of filenames, I'd be tempted to go
% ahead and scan it now for the vals array --
% with the new-fangled string functions (are they as quick as a regexp expression?)
pat="_"+digitsPattern; % to isolate the x,y,z
vals=str2double(extractAfter(extract(filenames,pat),'_')); % and convert those to numeric
% alternatively, with the old standby -- although it hasn't been internally vecorized
fmt1='val1_%d_val2_%d_val3_%d.txt';
vals=double(cell2mat(cellfun(@(s) cell2mat(textscan(s,fmt)),filenames,'UniformOutput',0)));
% Can try the above on real dataset; with toy set of 10 or so sample
% filenames here, there was no discernible timing difference.
% allocate for the others that have to read files for...
DL_n_bits = zeros(nfiles,1);
DL_BER = zeros(nfiles,1);
fmt2='Layer 1: DL n_bits = %f DL BER = %f';
for K = 1:nfiles
S=readlines(fullfile(foldername,filenames{K}));
ix=find(startsWith(S,'Frame 100 Finished!'))+1;
vals=cell2mat(textscan(S(ix),fmt));
DL_n_bits(K) = vals(1);
DL_BER(K) = vals(2);
end
I wonder if it's any quicker to find the particular line and parse it over regexp searching the whole file itself to find the same point in the really long chararacter string -- or how much more overhead the string array introuduces instead???
0 Kommentare
Siehe auch
Kategorien
Mehr zu Bioinformatics Toolbox finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!