Avoid repetition in job diary when running code in parallel
3 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hi,
I am using the parallel toolbox to run code that has been developed on a Mac and runs on a Unix cluster. I am using Windows 10 and want to set up the code from my machine to run on the Unix cluster.
If I use the Diary function after submitting a job, every 'someText' in
disp('someText')
shows up 9 or 10 times depending on how many CPUs the job is run on. Please let me know if I should not expect something like:
count
starting the function parallel run XXX start2end!
count
count
count
count
starting the function parallel run XXX start2end!
starting the function parallel run XXX start2end!
starting the function parallel run XXX start2end!
starting the function parallel run XXX start2end!
count
starting the function parallel run XXX start2end!
count
starting the function parallel run XXX start2end!
count
starting the function parallel run XXX start2end!
Warning: Table variable names were modified to make them valid MATLAB identifiers. The original names are saved in the VariableDescriptions property.
From running a job over 10 CPUs:
From coding in Fortran I know that it's possible to only display the output from a CPU of a certain rank rather than all (e.g. saying "if rank = 0" then print). Is there such a setting in MATLAB to enhance readability?
I am grateful for any advice on how to deal with this as I could not find a solution online.
Thank you and best wishes,
Linnéa
0 Kommentare
Antworten (2)
Raymond Norris
am 18 Aug. 2020
Hi Linnéa,
The batch command with a Pool argument is a wrapper to createCommunicatingJob of type Pool (not SPMD). That simply means that parS needs to have a call to spmd somewhere within in order to reference labindex (i.e. rank).
If you're able to post parS, we might be able to provide more guidance.
Raymond
Raymond Norris
am 17 Aug. 2020
Hi Linnéa,
It would help to see more of an example of your code. A parfor loop runs on a pool of workers that all think they are rank=0. However, an spmd uses MPI to communicatate to each of the rank, so each is assigned a different rank. Let me give you two quick examples (again, without knowing what you're code looks like) of calculating pi.
Submitting a job pool job, calling spmd. The top level task will then spawn a pool of workers (1 less then requested) to run the block.
c = parcluster('local');
j = c.createCommunicatingJob('NumWorkersRange',3, 'Type','pool');
j.createTask(@calcpi_spmd_block,0,{},'CaptureDiary',true);
j.submit
j.wait
j.Tasks(1).Diary
function calcpi_spmd_block
spmd
a = (labindex - 1)/numlabs;
b = labindex/numlabs;
fprintf('Subinterval: [%-4g, %-4g]\n', a, b);
myIntegral = integral(@iQuadPi, a, b);
fprintf('Subinterval: [%-4g, %-4g] Integral: %4g\n', ...
a, b, myIntegral);
piApprox = gplus(myIntegral);
end
approx1 = piApprox{1}; % 1st element holds value on worker 1.
fprintf('pi : %.18f\n', pi);
fprintf('Approximation: %.18f\n', approx1);
fprintf('Error : %g\n', abs(pi - approx1))
function y = iQuadPi(x)
y = 4./(1 + x.^2);
Submitting an spmd job, where the task is run on each worker.
c = parcluster('local');
j = c.createCommunicatingJob('NumWorkersRange',3, 'Type','spmd');
j.createTask(@calcpi_spmd_task,0,{},'CaptureDiary',true);
j.submit
j.wait
j.Tasks(1).Diary
j.Tasks(2).Diary
j.Tasks(3).Diary
function calcpi_spmd_task
a = (labindex - 1)/numlabs;
b = labindex/numlabs;
fprintf('Subinterval: [%-4g, %-4g]\n', a, b);
myIntegral = integral(@iQuadPi, a, b);
fprintf('Subinterval: [%-4g, %-4g] Integral: %4g\n', ...
a, b, myIntegral);
piApprox = gplus(myIntegral);
approx1 = piApprox; % 1st element holds value on worker 1.
fprintf('pi : %.18f\n', pi);
fprintf('Approximation: %.18f\n', approx1);
fprintf('Error : %g\n', abs(pi - approx1))
function y = iQuadPi(x)
y = 4./(1 + x.^2);
Notice the subtle differences between the two task functions. You'll see in both cases that we can make use of labindex (i.e. rank).
Raymond
3 Kommentare
Raymond Norris
am 18 Aug. 2020
Hi Linnéa,
Before I provide an answer, I'm perplexed by something you wrote in a previous post. I believe you are submitting from Windows to a Linux cluster. genpath will generate a list of subfolders from a given starting point. Take the following example:
genpath(strcat('/Users/',username,'/Documents/MATLAB/xxx/yyy'));
For starters, you might consider
genpath(fullfile('/Users',username,'Documents','MATLAB',',xxx','yyy'));
unless the file separator matters (calling strcat for CurrentFolder makes sense); however, you run strrep afterwards anyway. Secondly, on a Windows machine, I would expect this to return an empty string, since /User won't exist (correct?).
Because genpath works recurrisvely, you don't need to call genpath more than once under /Users/',username,'/Documents/MATLAB. xxx and zzz will automatically get picked up.
You can replace
clear Name; clear j; clear jj;
with
clear Name j jj
I don't see where
cluster = 'clusterName';
is being used.
Siehe auch
Kategorien
Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!