SPMD vs PARFOR and Memory Usage
Ältere Kommentare anzeigen
Dear all,
I am trying to migrate some of the code I have which utilizes parfor to spmd in order to use the codistributed array features and save considerable memory (because parfor is copying the huge matrix var2 into all the workers). So I was hoping to distribute the matrix var2 between all the workers and end up saving some memory. I am running a toy example on our linux servers using the following function:
function accum_results=try_memory_smpd()
tic
var1=repmat(linspace(1,100,100),100,1);
var2=2*linspace(0.001,0.02,600000);
accum_results_temp=zeros(100,600000);
upper_bound=100;
lower_bound=1;
spmd
D = codistributed(var1,codistributor('1d', 2));
temp=getLocalPart(D);
globalInd=globalIndices(D, 2);
local_lower_bound = find(globalInd == lower_bound, 1);
if ~isempty(local_lower_bound)
fprintf('The lower bound found in Lab %d, indice %d\n',labindex,local_lower_bound);
end
if isempty(local_lower_bound) && min(globalInd)>lower_bound
local_lower_bound=1;
end
local_upper_bound = find(globalInd == upper_bound, 1);
if ~isempty(local_upper_bound)
fprintf('The upper bound found in Lab %d, indice %d\n',labindex,local_upper_bound);
end
if isempty(local_upper_bound) && max(globalInd)<upper_bound
local_upper_bound=size(temp,2);
end
if ~(isempty(local_upper_bound) || isempty(local_lower_bound))
for j = local_lower_bound:local_upper_bound
accum_results_temp = accum_results_temp+bsxfun(@times,var2,temp(:,j));
end
fprintf('Lab %d works between indice %d and %d \n',labindex,local_lower_bound,local_upper_bound);
else
fprintf('No work for Lab %d!!\n',labindex);
end
D=[];
temp=[];
var2=[];
end
accum_results=zeros(100,600000);
for cell_ind=1:length(accum_results_temp)
accum_results=accum_results+accum_results_temp{cell_ind};
end
toc
end
Note that the sizes are fairly large and you may need to change the matrix sizes. Anyway, when I profile the code, it seems the bottleneck is mainly caused by the final for loop which adds up all the cell entries in the composite object returned by the spmd block (therefore the resulting matrix is 100x600000). I also note that the PARFOR implementation of the same code finishes in about half the time. The additional functionality added in the spmd code (if-checks etc) has no visible impact on performance. Using methods such as cell2mat etc. will defeat the purpose of the spmd implementation since it will create a copy of the data already stored on the workers. I'd be very grateful if someone can give me an idea/inspiration such that I can get away with using parallel code without excessive memory usage. Thanks in advance.
Cem
P.S. Here's the PARFOR implementation:
function accum_results=try_memory()
tic
var1=repmat(linspace(1,100,100),100,1);
var2=2*linspace(0.001,0.02,600000);
accum_results=zeros(100,600000);
parfor i=1:100
accum_results=accum_results+bsxfun(@times,var2,var1(:,i));
end
toc
end
Akzeptierte Antwort
Weitere Antworten (0)
Kategorien
Mehr zu Distributed Arrays finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!