Summing array elements seems to be slow on GPU
9 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Damian Suski
am 26 Apr. 2023
Kommentiert: Damian Suski
am 18 Mai 2023
I am testing the times of execution for the following function on CPU and GPU
function funTestGPU(P,U,K,UN)
for k = 1:P
H = exp(1i*K);
HU = U.*H;
UN(k,:) = sum(HU,[1,3]);
end
end
where
,
are complex arrays of size
and Kis a complex array of size
. So in each iteration I perform element-wise exp(), element-wise multiplication of two arrays and summing elements of 3D array along two dimensions.
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/1366839/image.png)
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/1366844/image.png)
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/1366849/image.png)
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/1366854/image.png)
I test the execution time on CPU and on GPU with the help of the following script
P = 200;
URe = 1/(sqrt(2))*rand(P);
UIm = 1/(sqrt(2))*rand(P);
KRe = 1/(sqrt(2))*rand(P,P,P);
KIm = 1/(sqrt(2))*rand(P,P,P);
% CPU
U = complex(URe, UIm);
K = complex(KRe, KIm);
UN = complex(zeros(P), zeros(P));
fcpu = @() funTestGPU(P,U,K,UN);
tcpu = timeit(fcpu);
disp(['CPU time: ',num2str(tcpu)])
% GPU
U = gpuArray(complex(URe, UIm));
K = gpuArray(complex(KRe, KIm));
UN = gpuArray(complex(zeros(P), zeros(P)));
fgpu = @() funTestGPU(P,U,K,UN);
tgpu = gputimeit(fgpu);
disp(['GPU time: ',num2str(tgpu)])
and I obtain the results
CPU time: 9.0315
GPU time: 3.3894
My concern is that if I remove the last operation from the funTestGPU (summing array elements) I obtain the results
CPU time: 8.0185
GPU time: 0.0045631
So it looks like the summation is the most time-consuming operation on GPU. Is that an expected result?
I wrote the analogical codes in cuPy and in Pytorch and there the summation does not seem to be the most time consuming operation.
I use Matlab 2019b. My graphics card is NVIDIA GeForce GTX 1050 Ti (768 CUDA cores), my processor is AMD Ryzen 7 3700X (8 physical cores).
2 Kommentare
Akzeptierte Antwort
Joss Knight
am 27 Apr. 2023
These are my results that I got on my (somewhat old) GeForce GTX 1080 Ti:
CPU time: 16.1288
GPU time: 0.96266
If I change the datatype to single I get:
CPU time: 14.9785
GPU time: 0.35102
That's maybe 2x faster?
So on the one hand your GPU is pretty slow and your CPU is pretty fast, and on the other maybe you could try using single precision instead, if you don't mind the loss of accuracy.
Weitere Antworten (1)
Joss Knight
am 27 Apr. 2023
Verschoben: Matt J
am 27 Apr. 2023
Why are you recomputing H and HU inside the loop? They do not change. If you remove the sum, because the results are never used from the first (P-1) iterations, only the last computation of those values will actually take place.
6 Kommentare
Siehe auch
Kategorien
Mehr zu GPU Computing in MATLAB finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!