Parfor with GPUs crashes
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hello, everobody!
I have a code, that uses GPUs. I would like to use this code in parallel for different settings, i.e. code(setting=1),code(setting=2),code(setting=3) etc. For that I am implementing a parfor loop on a Linux-based high performance cluster (HPC).:
parfor i=1:N
code(setting=i)
end
However, it often crashes, especially when number of workers N is larger (more than 4-5). Typically, the crash is followed by shutting down Matlab with "Bus error" or "Fatal error" in the terminal.
What I do in general is the following. Firstly, I request the necessary resources: N workers with sufficient memory and a gpu per worker. Then I check that I do have a GPU per worker by :
spmd
gpuDeviceCount
end
After that, I initialzie the parpool with:
c=parcluster;
c.NumWorkers=N;
parpool(N)
And then I run my code. Note that an individual job with one GPU (without parfor loop) works perfectly. Also, it almost always work for 2-3 workers in parallel.
3 Kommentare
Raymond Norris
am 14 Mär. 2023
This is requesting 5 chunks, with 4 cores and 1 GPU per chunk. But this doesn't ensure that the 5 chunks are on the same node. I also wonder why you're requesting 5 chunks? If you're running a local pool, you only need 1 chunk. Try the following:
qsub -I -X -l select=1:ncpus=4:ngpus=1:mem=20gb,software=matlab
Then in MATLAB run
pctconfig('preservejobs',true);
setenv('MDCE_DEBUG','true')
local = parcluster("local");
pool = local.parpool(4);
% Run your parallel code
If/when the pool crashes,
local.getDebugLog(local.Jobs(end))
Antworten (0)
Siehe auch
Kategorien
Mehr zu Parallel Computing Fundamentals finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!