Parfor with GPUs crashes

4 Ansichten (letzte 30 Tage)
Anton Baranikov
Anton Baranikov am 27 Feb. 2023
Kommentiert: Raymond Norris am 14 Mär. 2023
Hello, everobody!
I have a code, that uses GPUs. I would like to use this code in parallel for different settings, i.e. code(setting=1),code(setting=2),code(setting=3) etc. For that I am implementing a parfor loop on a Linux-based high performance cluster (HPC).:
parfor i=1:N
code(setting=i)
end
However, it often crashes, especially when number of workers N is larger (more than 4-5). Typically, the crash is followed by shutting down Matlab with "Bus error" or "Fatal error" in the terminal.
What I do in general is the following. Firstly, I request the necessary resources: N workers with sufficient memory and a gpu per worker. Then I check that I do have a GPU per worker by :
spmd
gpuDeviceCount
end
After that, I initialzie the parpool with:
c=parcluster;
c.NumWorkers=N;
parpool(N)
And then I run my code. Note that an individual job with one GPU (without parfor loop) works perfectly. Also, it almost always work for 2-3 workers in parallel.
  3 Kommentare
Anton Baranikov
Anton Baranikov am 27 Feb. 2023
@Raymond Norris, this is the command I use e.g. for 5 workers:
qsub -I -X -lselect=5:ncpus=4:ngpus=1:mem=20gb,software=matlab
yes I do a local pool. I tried to make PBSProProfile but the outcome was the same.
Raymond Norris
Raymond Norris am 14 Mär. 2023
This is requesting 5 chunks, with 4 cores and 1 GPU per chunk. But this doesn't ensure that the 5 chunks are on the same node. I also wonder why you're requesting 5 chunks? If you're running a local pool, you only need 1 chunk. Try the following:
qsub -I -X -l select=1:ncpus=4:ngpus=1:mem=20gb,software=matlab
Then in MATLAB run
pctconfig('preservejobs',true);
setenv('MDCE_DEBUG','true')
local = parcluster("local");
pool = local.parpool(4);
% Run your parallel code
If/when the pool crashes,
local.getDebugLog(local.Jobs(end))

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Kategorien

Mehr zu Parallel Computing Fundamentals finden Sie in Help Center und File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by