Info
Diese Frage ist geschlossen. Öffnen Sie sie erneut, um sie zu bearbeiten oder zu beantworten.
Matrix multiplication bug in GPU
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I am using 8.2.0.701 (R2013b) on a host with 64 AMD cores and 2 K20c GPUs. Driver version 331.62 on Ubuntu 12.04.4 LTS.
$ uname -a
Linux leibniz3 3.5.0-44-generic #67~precise1-Ubuntu SMP Wed Nov 13 16:16:57 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
The matrix multiplication on the GPU returns results that differ substantially from the CPU for matrices of size 2^13x2^13.
To replicate, simply run
clear
n = 2^13;
A = rand(n);
B = rand(n);
tic
C = A * B;
t = toc; fprintf('CPU time %f sec\n',t)
%%One GPU
gpuDevice(1); % reset device
tic;
Ag = gpuArray(A);
Bg = gpuArray(B);
C1 = gather(Ag * Bg);
t = toc; fprintf('1 GPU time %f sec\n',t)
%%Two GPUs
gpuDevice(1); % reset device
gpuDevice(2); % reset device
tic
cc = cell(2,1);
parfor i = 1:2
dev = gpuDevice;
% fprintf('Iter %d Device %d\n',i,dev.Index);
Ag = gpuArray(A);
Bg = gpuArray(B(:,(i-1)*n/2+1:i*n/2));
cc{i} = gather(Ag * Bg);
end
C2 = [cc{1} cc{2}];
t = toc; fprintf('2 GPU time %f sec\n',t)
fprintf('n = %5d %f %f\n', n, ...
max(max(abs(C - C1))), max(max(abs(C - C2))))
The error is substantial. Is this known behavior?
The code works for smaller powers of two. 2^13 is the first that causes the bug to show its ugly head. I did not check other values but I will be glad to.
With 1 GPU the difference max(max(abs(C - C1))) is 0.999716 With 2 GPUs the difference max(max(abs(C - C2))) is 134.766785
The difference is very large!
Here are the plots. The second is a zoom, cause due to size the difference was invisible because it seems it is along a boundary.
<<
>>
I will try your suggestions and follow back on this.
3 Kommentare
Edric Ellis
am 2 Jul. 2014
I can't reproduce the problem you're seeing in R2013b - but I have only a single K20c. Can you reproduce the problem using only a single GPU? Which OS are you using? Have you updated to the latest NVIDIA CUDA driver? Are you able to try R2014a (this includes a later version of the CUDA runtime libraries)?
Jill Reese
am 8 Jul. 2014
I am also unable to reproduce this on a single K20c in R2013b. I'm running a 12 core Debian machine with GPU driver version 331.62. On my system I see reasonable agreement between the CPU and GPU results:
max(max(abs(C-C1))) = 10^(-11)
As Edric mentioned, are you able to try R2014a to see if the problem is still reproducible for you in that version?
Antworten (0)
Diese Frage ist geschlossen.
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!