Optimising my data importer for large datasets

1 Ansicht (letzte 30 Tage)
HC98
HC98 am 26 Mär. 2023
Bearbeitet: Matt J am 26 Mär. 2023
So I have this
txtFiles = dir('*.txt') ; %loads txt files
N = length(txtFiles) ;
Numit = N;
[~, reindex] = sort( str2double( regexp( {txtFiles.name}, '\d+', 'match', 'once' ))); % sorts files
txtFiles = txtFiles(reindex);
for i = 1:N
data = importdata(txtFiles(i).name);
x = data(:,1);
udata(:,i) = data(:,2) ;
end
I have quite a large dataset (well over 200 files) and it takes ages to load things. How can I speed this up? Is there some sort of prepocessing I can do like merge all the files into one or something? I don't know...
  1 Kommentar
Walter Roberson
Walter Roberson am 26 Mär. 2023
I wonder if using a datastore would be appropriate for your work?

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Matt J
Matt J am 26 Mär. 2023
Bearbeitet: Matt J am 26 Mär. 2023
I don't see any pre-allocation of udata. Also, nothing is being done with x, so it will cut down on time if you don't create it.
udata=cell(1,N);
for i = 1:N
data = importdata(txtFiles(i).name);
%x = data(:,1);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);
  1 Kommentar
Matt J
Matt J am 26 Mär. 2023
Bearbeitet: Matt J am 26 Mär. 2023
If the data files have many columns, it will also go faster if you read in only the first two columns, maybe using textscan.
udata=cell(1,N);
for i = 1:N
fileID = fopen(txtFiles(i).name);
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Produkte


Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by