How to deal with a single tsv-file whose size is out of the memory?
Ältere Kommentare anzeigen
Hi, guys.
I have a tsv file whose size is 20GB while my pc's memory is only 16GB.
When I read the file, it always shows errors.
I tried tall array as follows but fails due to tsv file cannot be recognized.
So I am looking for advice.
ds = tabularTextDatastore('D:\database1\scipatlinkage\paperauthoridaffiliationname.tsv');
ds.TreatAsMissing = 'NA';
ds.SelectedVariableNames = {'paperid','authorid','affiliationame'};
ds.SelectedFormats(2:3) = {'%s','%s'};
pre = preview(ds)
My matlab is R2020a
1 Kommentar
Athul Prakash
am 24 Okt. 2020
What is the exact error you're getting? If it s a File Not Found issue, you may use the 'isfile' function to confirm the existence of your file before running the script. See this doc for 'isfile': https://in.mathworks.com/help/matlab/ref/isfile.html
Antworten (1)
Athul Prakash
am 24 Okt. 2020
I have noticed that you have not created the tall array in the code attached. Perhaps . . .
t = tall(ds);
might be missing.
.As alternatives to tall array, you may try using this datastore with 'mapreduce'. See this doc:
Alternatively, you may also write your own code to 'read' from the datastore iteratively and process the data in chunks that can fit on your RAM.
Hope it Helps!
1 Kommentar
ziyou teng
am 10 Nov. 2020
Kategorien
Mehr zu Whos finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!