Save a large array into equal length .csv files?
22 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hi Guys, I am trying to save an adjusted very large data set into equal length .csv files. I am using the following script from this link with my own database:
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - use parfor to parallelise the writetable loop
ds = datastore(writeDir);
N = numpartitions(ds, gcp);
csvDir2 = tempname
mkdir(csvDir2);
parfor idx1 = 1 : N
idx2 = 0;
subds = partition(ds, N, idx1);
while hasdata(subds)
idx2 = 1 + idx2;
fname = fullfile(csvDir2, sprintf('out_%06d_%06d.csv', idx1, idx2));
writetable(read(subds), fname);
end
end
I am adapting the script in step 4 to the following in order to specify that each .csv file has 20000 rows:
RequiredDataRowsPerFile = 20000;
ds = datastore(writeDir,'ReadSize',RequiredDataRowsPerFile);
It works to some degree as there is an impact; however, the outcome does not generate an equal distribution of .csv files in terms of number of rows (of course the last file will always be different).
I would appreciate any help. Thanks
Tim
0 Kommentare
Antworten (0)
Siehe auch
Kategorien
Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!