parfeval with status check using timer class

10 Ansichten (letzte 30 Tage)
Stephane Eisen
Stephane Eisen am 5 Apr. 2021
I am running a large design parameter sweep (~350k cases) using parfeval. The simulation is implemented in the standard way as a function handle that is passed to parfeval with the required input data (parfeval(p,@simulation,1,output). For some of the input parameter combinations, the simulation does not coverage so I created a timer that polls the futures to see if they have started and if so how long they have been running. If the run time exceeds a preset value, I cancel the future.
In my first implementation, I created all ~350k futures and then polled the futures on a preset interval using a single timer. However, the polling was extremely slow (I presume since I was polling all ~350k future every time) so I tried to create a Matrix of futures on which I create a timer object and pass only each column of futures. This results in a much smaller batch of futures to poll (2500 vs 350k)
With the small data set I used for testing (1000 futures, 10 timers) this seemed to work fine. When I expanded this to the full data set of 350k futures, I end up getting the following error., " java.lang.OutOfMemoryError: GC overhead limit exceeded" even though my machine was only using 130 of 188GB available.
With this in mind, I have the following questions:
  1. What is the source java.lang.OutOfMemoryError: GC overhead limit exceeded?
  2. Is there a limit to the size of the futures vector/matrix I should create?
  3. Is there a more efficient way to poll the execution time of each future?
Any suggestions to create a more robust implementation are also appreciated.
futureRows = 2500;
futureCols = 140;
F(futureRows,futureCols) = parallel.FevalFuture;
for pIdx = 1:size(F,2)
for fIdx = 1:size(F,1)
linIdx = sub2ind(size(F),fIdx,pIdx);
F(fIdx,pIdx) = parfeval(p,@simulation,1,inputData(linIdx));
end
futureTimer(pIdx) = parfevalTimer(F(:,pIdx));
end
the parfevalTimer function looks like this:
function t = parfevalTimer(F,timeout,refreshRate)
arguments
F {mustBeA(F,'parallel.FevalFuture')}
timeout {mustBeNumeric} = 600 % Default 600 second runtime limit
refreshRate {mustBeNumeric} = 600 % Default 600 second refresh rate
end
timeout = seconds(timeout); % Convert to Duration class
% Prepare timer object
t = timer('ExecutionMode','fixedSpacing');
t.UserData = struct('Futures',F,'Timeout',timeout);
t.TimerFcn = @(~,event)checkParpool(t);
t.Period = refreshRate;
t.StopFcn = @timerCleanup;
start(t)
function timerCleanup(s,~)
disp('Stopping Timer')
delete(s)
function checkParpool(t)
F = t.UserData.Futures;
emptyLocal = NaT('TimeZone','local');
state = {F.State};
nCompleted = sum(ismember(state,'finished'));
if nCompleted == numel(F)
stop(t);
end
idxRunning = find(ismember(state,'running'));
hasTimeoutFcn = @(start,finish)~isnat(start)&isnat(finish)&(datetime('now','TimeZone','local')-start)>t.UserData.Timeout; % Determine if task has timed out
start = {F(idxRunning).StartDateTime};
[start{cellfun(@isempty,start)}] = deal(emptyLocal);
finish = {F(idxRunning).FinishDateTime};
[finish{cellfun(@isempty,finish)}] = deal(emptyLocal);
idxTerminate = cellfun(@(Ts,Tf)hasTimeoutFcn(Ts,Tf),start,finish);
if sum(idxTerminate)
F(idxRunning(idxTerminate)).cancel;
disp('Number of cases that timed out in the timer period: ' + string(sum(idxTerminate)));
end

Antworten (0)

Kategorien

Mehr zu Parallel Computing Fundamentals finden Sie in Help Center und File Exchange

Produkte


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by