# Parallel Computing Toolbox (parfor slower than for, GPU slower than CPU)

18 views (last 30 days)
Oscar Martinez on 8 Apr 2021
Commented: Joss Knight on 13 Apr 2021
1. How can I measure the time transfer to workers?
More precisely when doing parfor like:
parfor i=1:100
N=sum(A(:,:,i))
end
how can I measure the time transfer of the array A to each worker? here A is an array of 200x200x100
Communication overhead: The specified variable appears inside a loop within different indexing expressions. Because the indices are inconsistent across the uses of the array created by the parfor loop, MATLAB sends the entire array to each worker, resulting in high data communication overhead. For example, the following code elicits this message for c, because there are two different indexing expressions for it.
2. GPU: Is there any functions like randn, randi, randsample for the GPU?
Ex: I randomly select some indexes
index = (randi([1 Nsources],1,Nsources_mut));
pob_out(index,1,2) = poblacion(index,1,2)
funpage develops in parallel on the GPU this operation:
CG=pagefun(@mtimes,AGpage,Fs);
(here Apage gpuArray of 200x200x100, and Fs 200 x200) In this way the GPU is doing 100 multiplications in parallel
How can I perform this operation in parallel on the GPU?
aG=sum(sum(AGpage.*Fs)); (I)
This operation is equivalent to
for i=1:100
A=AGpage(:,:,i); (II)
aG=trace(A'*Fs);
end
Are the functions trace or sum(sum(.) available on pagefun? Operations (I) and (II) are too slow.
3.If I done all this calculation with Matlab 2021, are they going to run in 2017? Since I have some colleagues that only have that version.
Joss Knight on 10 Apr 2021
Okay, I'll believe you're not a bot. But what is this snippet of commentary that doesn't seem relevant and refers to code that isn't provided?:
"Communication overhead: The specified variable appears inside a loop within different indexing expressions. Because the indices are inconsistent across the uses of the array created by the parfor loop, MATLAB sends the entire array to each worker, resulting in high data communication overhead. For example, the following code elicits this message for c, because there are two different indexing expressions for it."
It then refers to "funpage" instead of pagefun and goes on to make mistakes with markup.

Joss Knight on 10 Apr 2021
1) CPU/parfor: How can I measure the time transfer when doing parfor (since parfor is slower than for when calling to a part of an array).
Your snippet of code indexes variable A like this: A(:,:,i). Because i is the loop variable, this should result in only the correct slices of A going to each worker. So the premise you state that the whole array is copied is (should be) incorrect.
There isn't a way that I know of to measure the data transfer overhead independently in a parfor, since data transfer and loop execution are interleaved. You can probably infer it from wall clock timings measured by tic and toc - perhaps someone else has some tricks up their sleeve.
2) GPU: Is there any functions like randn, randi, randsample for the GPU? I need to use it to random select some cordinates of an array at each loop.
Yes. Use 'gpuArray' as an optional argument to rand, randn or randi.
3) pagefun: Includes matrix multiplication. But, how can I performe sum(sum(A)) in parallel on the GPU?
sum(sum(A)) works on the GPU. Also sum(A,'all') or sum(A,[1 2]). It's a good idea to actually try things before asking a question!

Oscar Martinez on 12 Apr 2021
Edited: Oscar Martinez on 12 Apr 2021
Thank you Joss
Maybe part of the problem is that I’m using Matlab 2017a. The problem is that I have bought Matlab 2021 with the parallel computing toolbox and due to delays in the payment from my Institution I received meanwhile a free trial but by mistake the parallel computing toolbox was not included so maybe some of the problems I encounter are corrected in this new version I didn't receive yet. To gain time I was testing that toolbox with an old version on the computer of a colleague.
Find below the scripts with the problems that I was trying to explain above.
1) Ok. I was trying to understand why the code is so slow when doing parfor.
This was my code:
parpool(16)
n=100;
Indiv_mut=randi([1 n],1,n);
tic
parfor i=1:n
j=Indiv_mut(i);
B=A(:,:,j);
C=B^2;
end
t_parfor=toc
tic
for i=1:n
j=Indiv_mut(i);
B=A(:,:,j);
C=B^2;
end
t_for=toc
%result: t_parfor=90s
%t_for=0.02s
At line 7 I can see this message (that I have mentioned you before)
Communication overhead: The specified variable appears inside a loop within different indexing expressions. Because the indices are inconsistent across the uses of the array created by the parfor loop, MATLAB sends the entire array to each worker, resulting in high data communication overhead.
I was trying to find a way to distinguish transfer time from calculation time. This was my test:
kk=0;
parpool(16)
length=[100 500 10^3 10^4];
for n=length
kk=kk+1;
A=rand(150,150,n);
tic
parfor i=1:n
B=A(:,:,i);
C=B^2;
end
tparfor=toc;
tic
for i=1:n
B=A(:,:,i);
C=B^2;
end
tfor=toc;
%%
improve(kk)=tfor/tparfor;
tic
parfor i=1:n
B=A(:,:,i);
end
tparfor_onlycall=toc;
tic
for i=1:n
B=A(:,:,i);
end
tfor_onlycall=toc;
improve_onlycall(kk)=tfor/tparfor;
end
%%
Result:
improve = 0.0301 0.2914 0.5497 0.7298
improve_onlycall= 0.0073 0.0323 0.0496 0.0728
As you can see parfor demands a huge time just to call the variable.
2) GPU: I will check other ways, I have used gpuArray as a variable before asking you.
But also in this case the GPU time is slower than in the CPU
tsumCPU=0;
tsumGPU=0;
for i=1:30;% just to measure the time calculation several times.
tic
Indiv_mut=randi([1 n],1,n);
trandCPU=toc;
tsumCPU=trandCPU+tsumCPU;
tic
Indiv_mut=randi([1 n],1,n,'gpuArray');
trandGPU=toc;
tsumGPU=trandGPU+tsumGPU;
end
Result:
tsumGPU =0.0017
tsumCPU = 9.0710e-04
3) Maybe I was not clear. Here A is not a matrix, it is an array of 200x200x100. I would like to sum over the 2 first variables. Using the GPU is slower than in the CPU. On the other hand, when trying to use
A=rand(150,150,100);
>> sum(A,[1 2])
I see this message:
Error using sum
Dimension argument must be a positive integer scalar within indexing range.
Maybe all the problems that I'm having are due to the fact that I'm using Matlab 2017a. Do you think that is the reason?
Joss Knight on 13 Apr 2021
It looks like you want to index A randomly inside the loop which is why parfor can't successfully slice A. Maybe there's a way for you to index A in order, but write the results out in a randomized way? Otherwise I don't think I can help you here.
Timing: You need to give the GPU enough work to do to see the benefits:
>> F = @(n,s)randi([1 n],1,n,s);
>> timeit(@()F(100,'double'))
ans =
9.8483e-06
>> timeit(@()F(100,'gpuArray'))
ans =
4.7101e-05
>> timeit(@()F(100000,'double'))
ans =
0.0021
>> gputimeit(@()F(100000,'gpuArray'))
ans =
1.1839e-04
SUM: You are right, you cannot specify a vector of dimensions as an argument to sum in R2017a. Instead, either use sum(sum(A)), or, for extra efficiency:
numPages = size(A,3);
sumA = sum(reshape(A,[],numPages));
sumA = reshape(sumA,1,1,numPages);

### Categories

Find more on Deep Learning in Parallel and in the Cloud in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by