Mapreduce on parallel cluster - Database or disk full - How control storage of intermediate files?
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
Christian
am 9 Jul. 2019
Beantwortet: Christian
am 10 Jul. 2019
I wish to calculate several statistics (Spectra, Correlation Functions, etc.) of ~400 files with 6e6 doubles per file and afterwards average over all files to get average spectra, correlation functions, etc. To make things fast, I try to use mapreduce on a parallel cluster. This works like a charm as long as there are relatively few files (~100), but with a larger amount of files I get this error message:
Error using parallel.mapreduce.KeyValueOutputStore/addmulti (line 63)
Error in adding keys and values.
Error in Analysis20190708>Analysis (line 115)
addmulti(intermKVStore, {'StatNames'}, {Stats});
Error in parallel.internal.pool.deserialize>@(data,info,intermKVStore)Analysis(data,Parameters,info,intermKVStore)
Error in mapreduce (line 116)
outds = execMapReduce(mrcer, ds, mapfun, reducefun, parsedStruct);
Error in Analysis20190708 (line 72)
outDS = mapreduce(ds, mapper, @reduceAnalysis,inpool);
Caused by:
The database /tmp/filename/TaskOutput7.db is full. (database or disk is full)
The message occurs after around 50% of the map phase is done, but later when I reduce the size of the result vectors (less frequently sampled spectra for example). I checked with an admin and the /tmp indeed has very limited free space.
The question is now: How do I tell MATLAB to store these intermediate(?) files to a different location with more storage
0 Kommentare
Akzeptierte Antwort
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Standard File Formats finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!