- Implicit dimension expansion, and
- The new multi-dimension arguments to sum
Speed up big matrix multiplication (Parallel Processing/GPU)
21 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Andreas Dorner
am 10 Jan. 2019
Kommentiert: Andreas Dorner
am 11 Jan. 2019
Hello there,
below is the code i want to run. The rand()-calls are only for code simplicity. In my code the variables obviously have meaningful content.
for N = 1024 this takes about 2 hrs to run on my machine. I've tried so many things, e.g. precalculate the cosArgs.
N = 1024;
img = rand(N);
cosArg1 = rand(N^2,1);
cosArg2 = rand(N^2,1);
[q, p] = meshgrid(0:N-1, 0:N-1); %p and q are just another NxN size matrices respectively
recon = zeros(numel(img),1);
for k = 1:numel(img)
a = img.*cos(cosArg1(k)*p).*cos(cosArg2(k)*q);
recon(k) = sum(a(:));% sum of vec is faster then sumsum of matrix although we need to save it as variable
end
Is there any clever way to speed this code up?
_______
I also just bought Parallel Processing Toolbox to make it work with GPU-Arrays. This nown takes abouzt 17 min with a GTX 1060. The variables ending with GPU are just gpuArray-Casts of their original.
EDIT: by first casting to single, i cut it down to 10 min.
Is there something I can do better?
cosArg1GPU = gpuArray(single(cosArg1));
cosArg2GPU = gpuArray(single(cosArg2));
imgGPU = gpuArray(single(img));
reconGPU = gpuArray(single(recon));
pGPU = gpuArray(single(p));
qGPU = gpuArray(single(q));
for k = 1:numel(imgDCTGPU)
% sum of vec is faster then sumsum of matrix although we need to save it as variable
a = imgDCTGPU.*cos(cosArg1GPU(k)*pGPU).*cos(cosArg2GPU(k)*qGPU);
reconGPU(k) = sum(a(:));
end
0 Kommentare
Akzeptierte Antwort
Edric Ellis
am 11 Jan. 2019
Bearbeitet: Edric Ellis
am 11 Jan. 2019
You should take advantage of:
and then perform the calculation in chunks. The idea is that instead of looping over single pages, you calculate multiple pages simultaneously. I'm not sure how much better this is than your original case though.
N = 1024;
img = rand(N, 'gpuArray');
cosArg1 = rand(1,1,N^2, 'gpuArray');
cosArg2 = rand(1,1,N^2, 'gpuArray');
[q, p] = meshgrid(gpuArray(0:N-1), gpuArray(0:N-1));
recon = zeros(numel(img),1, 'gpuArray');
chunk = 128; % Might need to reduce this if it takes too much memory
tic
for k = 1:chunk:numel(img)
range = k:(k+chunk-1);
% The following line relies on implicit dimension expansion
% to calculate "chunk" pages of "a" simultaneously
a = img .* cos(p .* cosArg1(1,1,range)) .* cos(q .* cosArg2(1,1,range));
% Use the vector syntax of SUM to reduce to a 1x1xchunk "vector",
% and assign into "recon"
recon(range) = sum(a, [1 2]);
end
toc
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Parallel Computing Fundamentals finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!