Create tall array/datastore of multiple .csv files with different file formats (Big Data)

14 Ansichten (letzte 30 Tage)
Hi all,
I need to do some analysis on a few dozen Big Data .csv files and would like to make use of the datastore / tall arrays features. The problem is that I can not just use the datastore function to read in all .csv's from a folder, since every file has a different variable in the third collumn (date and time in first and second collumn) and datastore only works for files with the same format.
I would like to create a tall array with the date in first collumn, time in second collumn and the variable in the third collumn. Then the variable from the following .csv would be in the fourth collumn and so on.
I tried making separate datastores of all files and using horzcat on their tall arrays to combine these, but this doesn't work since the tall arrays are not based upon the same datastore.
The approach below just makes a datastore for all .csv files in the folder, selects the desired variable, turns them into a tall array and gathers all values and write these into memory. Then with horzcat I can combine these cell arrays and turn them back into a tall array. This kinda beats the purpose of datastore, since I now still have to read all data in memory and then go back to a tall array or datastore.
for i = 1:numFiles
nameFile = [dirFiles(i).name(1:end-4)];
ds.(nameFile) = datastore([folderFiles '\' dirFiles(i).name], 'TreatAsMissing', '<NA>');
ds.(nameFile).SelectedVariableNames = (nameFile);
tt.(nameFile) = tall(ds.(nameFile));
value.(nameFile) = gather(tt.(nameFile));
end
Is there any way I can make a datastore/tall array from multiple .csv files with different file formats without having to read them all in memory first? I hope this makes any sense and you can steer me into the right direction. Much appreciated.

Antworten (1)

Shashank
Shashank am 21 Feb. 2017
Hi Matthjis,
You can write a custom file read function using fileDatastore. This is used for custom files.
You can read about that here:
Then you can proceed to create a tall array from this.
Regards,
Shashank

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by