Parfeval - Memory consumption piling up - Clear output data?

19 Ansichten (letzte 30 Tage)
Alexander Jensen
Alexander Jensen am 17 Okt. 2018
Kommentiert: Edric Ellis am 19 Okt. 2018
Hi everyone!
I've been messing around with parfeval for a while now (and I've previously used parfor and spmd to a large extent), and I'm really excited about the functionalities that parfeval offers ! (and I explicitly needs what parfeval (or parfevalOnAll) offers!)
Anyway, my issue is that the memory consumption(/leak?) piles up over time, even though I'm fetching result faster that the results are produced, cf. my example code below (which obviously should NOT just run, but the idea is there):
% preallocate array for all reduced data (fits easily in memory)
allReducedData = nan(someDimensionsThatFit);
% Create poolio!
poolio = gcp('nocreate');
if isempty(poolio)
poolio = parpool('local',numParWorkers);
end
% Create parallel jobs/tasks
FUTUREOBJECTS(numel(datalist),numberOfSubSamples) = parallel.FevalFuture(); % Preallocation
par_correctionMap = parallel.pool.Constant(correctionMap);
for sampleIndex = 1:numel(datalist)
for subsamplenumber = 1:numberOfSubSamples
FUTUREOBJECTS(sampleIndex,subsamplenumber) = parfeval(poolio,@loadFcn,1,datalist, sampleIndex, subsamplenumber, par_correctionMap);
end
end
% Wait for parallel jobs/tasks to finish and fetch outputs as they come
% along
for processIndex = 1:numel(FUTUREOBJECTS)
% There's approximately 4-5 seconds between each fetchNext:
[completedIndex,data] = fetchNext(FUTUREOBJECTS);
% From hereon it takes about 1 second:
% The following cannot be done on CPU (too slow) and cannot be done
% parallel on GPU (not enough memory on card for multiple copies of
% gpuArray):
data = gpuArray(data);
data = filteringFunction(data);
data = reductionFunction(data);
allReducedData(:,:,:,processIndex) = data;
end
function data = loadFcn(datalist,sampleIndex,subsamplenumber,par_correctionMap)
% datalist is simply a dir('*') structure of .edf files/their location
% correctionMap is a gain-map, accounting for adjusting in the particular
% image, saved in a parallel.pool.Constant object to avoid excessive
% overhead
sampleName = datalist(sampleIndex).name;
sampleFolder = datalist(sampleIndex).folder;
filename = fullfile(sampleFolder,sampleName, sprintf('%s_%2d.edf',sampleName,subsamplenumber));
data = READFCN(filename).*par_correctionMap.Value;
end
So, how do I clear out the result returned from the FUTUREOBJECTS after I've actually processed it (if it shouldn't do it by itself, which I find odd). I've read that memory leak from parfeval were fixed in 2015, however, if this is the case, then I do not understand how my code can take up (up to) 240 GB RAM!

Akzeptierte Antwort

Edric Ellis
Edric Ellis am 18 Okt. 2018
It would be really helpful if you could come up with an mvce to demonstrate the problem, as well as the precise version of MATLAB / Parallel Computing Toolbox you're using.
One thing to note: each parallel.FevalFuture object holds on to the output arguments until you clear it (it must do this to allow you to call fetchOutputs at any point). Perhaps it might help to drop the completed futures from your array of futures? I.e.
[completedIndex,data] = fetchNext(FUTUREOBJECTS);
FUTUREOBJECTS(completedIndex) = [];
I tried the following:
N = 1500;
for idx = N:-1:1
f(idx) = parfeval(@rand, 1, N);
end
total = 0;
for idx = 1:N
[idx, data] = fetchNext(f);
f(idx) = [];
total = total + sum(data(:));
end
and the f(idx) = []; statement was sufficient to keep the client memory usage vaguely sensible. In this particular case, the workers were able to produce results faster than the client could consume them, so there was a period where the client memory usage increased while there was a backlog of unconsumed futures.
  5 Kommentare
Alexander Jensen
Alexander Jensen am 18 Okt. 2018
Bearbeitet: Alexander Jensen am 18 Okt. 2018
Indeed, it would.
Hmm, the code ran but after ~200 jobs: Error using parallel.FevalFuture/fetchNext (line 217) The function evaluation completed with an error. (I used 3 workers because of reasons (if it matters) )
Error in test_script (line 188)
% "loadedFcn" TO FINISH:
Caused by:
Error using parallel.ConcurrentJob/pIsMatlabPoolJob (line 114)
Unable to read file
'C:\Users\superuser\AppData\Roaming\MathWorks\MATLAB\local_cluster_jobs\R2018a\Job3.in.mat'.
No such file or directory.
Searched for solutions, however, none came up.
(if I need to open another question, just say so :-) )
EDIT: I'm setting up another prefdir, just to make sure it has nothing to do with the directory.
Edric Ellis
Edric Ellis am 19 Okt. 2018
Hm, that's a very strange error - not one I've come across before. If you continue to see that, I'd suggest contacting MathWorks support - that shouldn't be happening.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Asynchronous Parallel Programming finden Sie in Help Center und File Exchange

Produkte


Version

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by