Avoid repetition in job diary when running code in parallel

Question

L. Borealis am 17 Aug. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/580503-avoid-repetition-in-job-diary-when-running-code-in-parallel

Bearbeitet: L. Borealis am 19 Aug. 2020

Hi,

I am using the parallel toolbox to run code that has been developed on a Mac and runs on a Unix cluster. I am using Windows 10 and want to set up the code from my machine to run on the Unix cluster.

If I use the Diary function after submitting a job, every 'someText' in

disp('someText')

shows up 9 or 10 times depending on how many CPUs the job is run on. Please let me know if I should not expect something like:

count
 
starting the function parallel run XXX start2end!
count
count
count
count
 
 
 
 
starting the function parallel run XXX start2end!
starting the function parallel run XXX start2end!
starting the function parallel run XXX start2end!
starting the function parallel run XXX start2end!
count
 
starting the function parallel run XXX start2end!
count
 
starting the function parallel run XXX start2end!
count
 
starting the function parallel run XXX start2end!
Warning: Table variable names were modified to make them valid MATLAB identifiers. The original names are saved in the VariableDescriptions property.

From running a job over 10 CPUs:

From coding in Fortran I know that it's possible to only display the output from a CPU of a certain rank rather than all (e.g. saying "if rank = 0" then print). Is there such a setting in MATLAB to enhance readability?

I am grateful for any advice on how to deal with this as I could not find a solution online.

Thank you and best wishes,

Linnéa

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Raymond Norris am 18 Aug. 2020

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/580503-avoid-repetition-in-job-diary-when-running-code-in-parallel#answer_481584

Hi Linnéa,

The batch command with a Pool argument is a wrapper to createCommunicatingJob of type Pool (not SPMD). That simply means that parS needs to have a call to spmd somewhere within in order to reference labindex (i.e. rank).

If you're able to post parS, we might be able to provide more guidance.

Raymond

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

L. Borealis am 18 Aug. 2020

Actually, it is parfor and not spmd that we use. Sorry for the mixup! I am new to parallel computing and this toolbox. Thanks a lot! However, the code seems to now be submitted to a worker from the beginning rather than after parfor as I described above... This was not the case when the code was last run just under a year ago.

Melden Sie sich an, um zu kommentieren.

Answer 2

Raymond Norris am 17 Aug. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/580503-avoid-repetition-in-job-diary-when-running-code-in-parallel#answer_481200

In MATLAB Online öffnen

Hi Linnéa,

It would help to see more of an example of your code. A parfor loop runs on a pool of workers that all think they are rank=0. However, an spmd uses MPI to communicatate to each of the rank, so each is assigned a different rank. Let me give you two quick examples (again, without knowing what you're code looks like) of calculating pi.

Submitting a job pool job, calling spmd. The top level task will then spawn a pool of workers (1 less then requested) to run the block.

c = parcluster('local');
j = c.createCommunicatingJob('NumWorkersRange',3, 'Type','pool');
j.createTask(@calcpi_spmd_block,0,{},'CaptureDiary',true);
j.submit
j.wait
j.Tasks(1).Diary
function calcpi_spmd_block
spmd
    a = (labindex - 1)/numlabs;
    b = labindex/numlabs;
    fprintf('Subinterval: [%-4g, %-4g]\n', a, b);
    
    myIntegral = integral(@iQuadPi, a, b);
    fprintf('Subinterval: [%-4g, %-4g]   Integral: %4g\n', ...
        a, b, myIntegral);
    
    piApprox = gplus(myIntegral);
end
approx1 = piApprox{1};  % 1st element holds value on worker 1.
fprintf('pi           : %.18f\n', pi);
fprintf('Approximation: %.18f\n', approx1);
fprintf('Error        : %g\n', abs(pi - approx1))
function y = iQuadPi(x)
y = 4./(1 + x.^2);

Submitting an spmd job, where the task is run on each worker.

c = parcluster('local');
j = c.createCommunicatingJob('NumWorkersRange',3, 'Type','spmd');
j.createTask(@calcpi_spmd_task,0,{},'CaptureDiary',true);
j.submit
j.wait
j.Tasks(1).Diary
j.Tasks(2).Diary
j.Tasks(3).Diary
function calcpi_spmd_task
a = (labindex - 1)/numlabs;
b = labindex/numlabs;
fprintf('Subinterval: [%-4g, %-4g]\n', a, b);
myIntegral = integral(@iQuadPi, a, b);
fprintf('Subinterval: [%-4g, %-4g]   Integral: %4g\n', ...
    a, b, myIntegral);
piApprox = gplus(myIntegral);
approx1 = piApprox;   % 1st element holds value on worker 1.
fprintf('pi           : %.18f\n', pi);
fprintf('Approximation: %.18f\n', approx1);
fprintf('Error        : %g\n', abs(pi - approx1))
function y = iQuadPi(x)
y = 4./(1 + x.^2);

Notice the subtle differences between the two task functions. You'll see in both cases that we can make use of labindex (i.e. rank).

Raymond

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

L. Borealis am 18 Aug. 2020

In MATLAB Online öffnen

Dear Raymond,

Thank you very much for your very helpful and detailled response! I am pretty sure that I am submitting an spmd job. My code is huge but this part might help you to understand better what I am doing (I am pretty sure you would be mainly interested int he last few lines but I have added a bit of extra information in case it is useful):

%% Set Cluster
% Get a handle to the cluster
c = parcluster;
%define username
username = 'f';
% Create a list of all the local paths so that the
% files would be on path and could be read on the cluster; 
% adjust for Windows usage if necessary
localPath1 = genpath(strcat('/Users/',username,'/Documents/MATLAB/xxx/yyy'));
if username == 'f'
    localPath1 = strrep(localPath1, '\', '/');
end
localPath2 = genpath(strcat('/Users/',username,'/Documents/MATLAB/zzz'));
localPath3 = genpath(strcat('/Users/',username,'/Documents/MATLAB/xxx/www'));
if username == 'f'
    localPath3 = strrep(localPath3, '\', '/');
end
localPath = [localPath1,localPath2,localPath3];
cellLocalPath = split(localPath,':');
for i = 1:length(cellLocalPath)
    cellLocalPath{i} = strrep(cellLocalPath{i},strcat('/Users/',username,'/Documents/MATLAB/'),strcat('/someRemoteStorage/home/',username,'/'));
end
cellLocalPath = cellLocalPath(find(~cellfun(@isempty,cellLocalPath)));
cellLocalPath = cellLocalPath';
%
clear Name; clear j; clear jj;
Name{1} = 'A';
Name{2} = 'B';
Name{3} = 'E';
Name{4} = 'G';
Name{5} = 'T';
Name{6} = 'At';
cluster = 'clusterName';
c.AdditionalProperties.QueueName = 'defq';
%% Schedule jobs
numCores = 9;
c.AdditionalProperties.QueueName = 'defq';
c.AdditionalProperties.QoS = 'medium';
for i=1:length(Name)
    j{i} = c.batch(@parS, 1, {Name{i},numCores,username},'Pool',numCores,'AutoAddClientPath',false,'CaptureDiary',true,'CurrentFolder',strcat('/someRemoteStorage/home/',username,'/xxx'),'AdditionalPaths',cellLocalPath);
end

I am not yet sure how/where I would need to implement the "if rank = 0 then disp('xxx')" part but I am very much looking forward to hearing back.

Thanks again and best wishes,

Linnéa

Raymond Norris am 18 Aug. 2020

In MATLAB Online öffnen

Hi Linnéa,

Before I provide an answer, I'm perplexed by something you wrote in a previous post. I believe you are submitting from Windows to a Linux cluster. genpath will generate a list of subfolders from a given starting point. Take the following example:

genpath(strcat('/Users/',username,'/Documents/MATLAB/xxx/yyy'));

For starters, you might consider

genpath(fullfile('/Users',username,'Documents','MATLAB',',xxx','yyy'));

unless the file separator matters (calling strcat for CurrentFolder makes sense); however, you run strrep afterwards anyway. Secondly, on a Windows machine, I would expect this to return an empty string, since /User won't exist (correct?).

Because genpath works recurrisvely, you don't need to call genpath more than once under /Users/',username,'/Documents/MATLAB. xxx and zzz will automatically get picked up.

You can replace

clear Name; clear j; clear jj;

with

clear Name j jj

I don't see where

cluster = 'clusterName';

is being used.

L. Borealis am 18 Aug. 2020

Bearbeitet: L. Borealis am 19 Aug. 2020

In MATLAB Online öffnen

Dear Raymond,

Thanks so much for all your advice and for connecting my 2 entries! I will definitely consider fullfile - that will be great for being able to use it both on a Linux and a Windows system, which is what we are planning to do.

A big problem I have with adjusting this code to be runnable on Windows is that I have not worked on a Windows machine since high school/never coded on it. So thank you very much for pointing me to this. It saved me a ton of confusion. I am running everything again and will let you know about the results.

%% Set Cluster
% Get a handle to the cluster
c = parcluster;
%define username 
username = 'f';
   
    % Create a list of all the paths supplement it so that the
    % files would be on path and could be read on cluster
    
    %once it is running, modify to struct, i.e. franssel.OS = Windows
    
if strcmp(username,'f')
    
    % Windows version
    localPath1 = genpath(fullfile('c:\','Users',username,'Documents','MATLAB','xxx','yyy'));
    localPath2 = genpath(fullfile('c:\','Users',username,'Documents','MATLAB','xxx','zzz','www'));
    localPath = [localPath1,localPath2];
    cellLocalPath = split(localPath,';');
    
    for i = 1:length(cellLocalPath)
        cellLocalPath{i} = strrep(cellLocalPath{i}, '\', '/');
        cellLocalPath{i} = strrep(cellLocalPath{i},strcat('c:/Users/',username,'/Documents/MATLAB'),strcat('/remoteStoageLoc/home/',username));
    end
  
else
   %Unix version
    localPath1 = genpath(strcat('/Users/',username,'/Documents/MATLAB/xxx/yyy'));
    localPath2 = genpath(strcat('/Users/',username,'/Documents/MATLAB/xxx/zzz/www'));
    localPath = [localPath1,localPath2];
    cellLocalPath = split(localPath,':');
    for i = 1:length(cellLocalPath)
        cellLocalPath{i} = strrep(cellLocalPath{i},strcat('/Users/',username,'/Documents/MATLAB/'),strcat('/remoteStoageLoc/home/',username,'/'));
    end
     
end
  cellLocalPath = cellLocalPath(find(~cellfun(@isempty,cellLocalPath)));
  cellLocalPath = cellLocalPath'
%
clear Name j jj;
Name{1} = 'A';
Name{2} = 'B';
Name{3} = 'E';
Name{4} = 'G';
Name{5} = 'T';
Name{6} = 'At';
numCores = 0;
c.AdditionalProperties.QueueName = 'defq';

Does this look better to you? The problem is that - while the paths look as good now - it does not run anymore (i.e. the diary is empty and the jobs reach 'failed' status quickly when running the "schedule jobs" section above. As the output on a windows machine from

cellLocalPath = cellLocalPath'

I get a 1×90 cell array with entries like:

/remoteStorageLoc/home/f/xxx/yyy

This is what I needed and the Windows paths look good too now. So this puzzles me! Could it have to do with the integration script being written for unix use only?

(Btw I tried running it with numCores = 0 because after chatting to my colleague, we realised that something must have happened between when he last ran the code about 9 months ago and now because he did not use to get the repetitive outputs from the parallelisation when he last ran it but he is now too. We run it in Matlab 2019a (I believe he may have ran it on 2018a or b previously) and call Python, which has been updated since. If you know whether any of this could cause parallelisation from the beginning rather than after parfor only, please let me know. Thanks a lot!

The cluster is defined again by the next fn that is called - so it is redundand where it is.

Once again, thanks very much!!

Melden Sie sich an, um zu kommentieren.

Avoid repetition in job diary when running code in parallel

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Avoid repetition in job diary when running code in parallel

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden