gpuArray performance on 'xcorr' function
10 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
SangMin
am 31 Dez. 2019
Kommentiert: SangMin
am 2 Jan. 2020
Hi,
I am tring to improve a performance on my script, which is using 'xcorr' function heavily.
I found that 'xcorr' function supports gpuArray and I tried it. However, it seems the performance is not good.
I did three simple examples
t = 0:0.001:10-0.001;
x = cos(2*pi*10*t) + randn(size(t));
X = gpuArray(x);
tic
[r,lags] = xcorr(X,X,'normalized');
r = gather(r);
toc
% Elapsed time is 0.017178 seconds.
t = 0:0.001:10-0.001;
x = cos(2*pi*10*t) + randn(size(t));
tic
[r,lags] = xcorr(x,x,'normalized');
r = gather(r);
toc
% Elapsed time is 0.004627 seconds.
t = 0:0.001:10-0.001;
x = cos(2*pi*10*t) + randn(size(t));
X = gpuArray(single(x));
tic
[r,lags] = xcorr(X,X,'normalized');
r = gather(r);
toc
% Elapsed time is 0.015555 seconds.
just normal array is much faster than gpuArray.

For 'single' type data, GPU is much faster!
What should I do to increase the performance on 'xcorr' function?
(I have several thousond of array and each array has 10k elements.)
0 Kommentare
Akzeptierte Antwort
Walter Roberson
am 31 Dez. 2019
Bearbeitet: Walter Roberson
am 31 Dez. 2019
To increase the performance of xcorr double precision on GPU, you should obtain a different GPU. The GeForce GTX 1060 you have runs its double precision at 1/32 of the single precision rate, which is the slowest kind of double precision that NVIDIA offers.
3 Kommentare
Walter Roberson
am 1 Jan. 2020
When I do those timing tests, the times I see are quite variable.
Times are a little less variable if I construct a better test -- but double precision GPU is still the slowest.
t = 0:0.001:10-0.001;
Xd = cos(2*pi*10*t) + randn(size(t));
Xs = single(Xd);
Xgd = gpuArray(Xd);
Xgs = gpuArray(single(Xd));
N = 100;
td = zeros(N,1);
ts = zeros(N,1);
tgd = zeros(N,1);
tgs = zeros(N,1);
fd = @() xc_cpu(Xd);
fs = @() xc_cpu(Xs);
fgd = @() xc_gpu(Xgd);
fgs = @() xc_gpu(Xgs);
for K = 1 : N; td(K) = timeit(fd, 0); end
for K = 1 : N; ts(K) = timeit(fs, 0); end
for K = 1 : N; tgd(K) = gputimeit(fgd, 0); end
for K = 1 : N; tgs(K) = gputimeit(fgs, 0); end
plot([td, ts, tgd, tgs]);
legend({'double (CPU)', 'single (CPU)', 'double (GPU)', 'single (GPU)'});
function [r, lags] = xc_cpu(X)
[r, lags] = xcorr(X, X, 'normalized');
end
function [r, lags] = xc_gpu(X)
[r, lags] = xcorr(X, X, 'normalized');
r = gather(r);
end
However!! If you change the upper bound fro 10-0.001 to 100-0.001 then you will get quite a different graph, with double precision on CPU becoming the slowest, and single precision on GPU becoming the fastest. This suggests that for arrays of the size you were using, that transfer and synchronization times were overwhelming the GPU gain.
Weitere Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!