How to shut down all running workers of paarpools?

34 Ansichten (letzte 30 Tage)
Felix
Felix am 6 Mär. 2023
Kommentiert: Raymond Norris am 14 Mär. 2023
How can I find and shut down all workers of all parpools that might currently be running?
During debugging I frequently run into crashes and out of memory errors. Often, some worker processes keep running and I would like to know, how to best close all of them, before starting another script.

Antworten (3)

Raymond Norris
Raymond Norris am 6 Mär. 2023
Hi @Felix. If even if a single worker crashes, all workers will terminate. Can you elaborate a bit more on a couple of things
  1. Are you using a local pool or a cluster? If cluster, MJS or your own scheduler (and if so, which)?
  2. Which parallel constructs are you using (parfor, parfeval, etc.)? Can you give a simple example of what might crash. Not interested in the details (I'm sure the worker(s) are crashing), more interested in how your running the code.
  1 Kommentar
Edric Ellis
Edric Ellis am 7 Mär. 2023
Note that on "local" and MJS clusters, the parallel pool will not necessarily immediately terminate when a single worker crashes. On those clusters, pools that have not yet used spmd can survive losing workers.

Melden Sie sich an, um zu kommentieren.


Edric Ellis
Edric Ellis am 7 Mär. 2023
You can shut down all remaining workers of the currently running pool by executing:
delete(gcp('nocreate'))
There should be no running workers other than in the current pool.

Felix
Felix am 8 Mär. 2023
  1. I'm using local pools on my machine with default settings. On my machine this defaults to 12 workers.
  2. So far, I'm using parfor and the run command with MultiStart problems. I'll sometimes start a pool before running a script via parpool to reduce runtime of that script.
A simple, somewhat pseudocode example of my monte carlo stuff might be:
relevant_input = randn(1000, 1);
relevant_output = nan(height(relevant_input), 1);
param = 10;
parpool;
my_fun = @(input) elaborate_function(par, relevant_input);
parfor h=1:height(relevant_input)
relevant_ouput(h,1) = my_fun(input);
end
function y = elaborate_function(par, x)
y = param*x.*sin(x);
end
Another use case is the MultiStart object with
ms = MultiStart('UseParallel', true, 'Display','iter');
, which I use with run.
My scripts sometimes crash and I have trouble restarting them, because some workers do not seem to clear their memory when they crash. When I try to restart I get warnings such as:
Starting parallel pool (parpool) using the 'Processes' profile ...
Preserving jobs with IDs: 10 12 13 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile Processes. To create 'myCluster' use 'myCluster = parcluster('Processes')'.
However, these crash dump files and the preserved jobs hog up way too much memory on my machine. I am looking for a couple lines of code to put at the start of my scripts that search running jobs, such as the ones containing crash dump files and terminate them if they exist, so I don't have to type delete(myCluster.Jobs) every time myself.
  1 Kommentar
Raymond Norris
Raymond Norris am 14 Mär. 2023
I'm confused how the crash dump files and preserverd jobs how up too much memory. Do you mean disk space?
If a job is running, I'm not sure there would be a crash dump file (untill the end). And do you want to delete the crash file or the job? If you're running a parallel pool and the pool crashes, there's no job to delete.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu MATLAB Parallel Server finden Sie in Help Center und File Exchange

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by