Optimising my data importer for large datasets
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
So I have this
txtFiles = dir('*.txt') ; %loads txt files
N = length(txtFiles) ;
Numit = N;
[~, reindex] = sort( str2double( regexp( {txtFiles.name}, '\d+', 'match', 'once' ))); % sorts files
txtFiles = txtFiles(reindex);
for i = 1:N
data = importdata(txtFiles(i).name);
x = data(:,1);
udata(:,i) = data(:,2) ;
end
I have quite a large dataset (well over 200 files) and it takes ages to load things. How can I speed this up? Is there some sort of prepocessing I can do like merge all the files into one or something? I don't know...
1 Kommentar
Antworten (1)
Matt J
am 26 Mär. 2023
Bearbeitet: Matt J
am 26 Mär. 2023
I don't see any pre-allocation of udata. Also, nothing is being done with x, so it will cut down on time if you don't create it.
udata=cell(1,N);
for i = 1:N
data = importdata(txtFiles(i).name);
%x = data(:,1);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);
1 Kommentar
Matt J
am 26 Mär. 2023
Bearbeitet: Matt J
am 26 Mär. 2023
If the data files have many columns, it will also go faster if you read in only the first two columns, maybe using textscan.
udata=cell(1,N);
for i = 1:N
fileID = fopen(txtFiles(i).name);
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);
Siehe auch
Kategorien
Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!