Speed of looped operation on a GPU depending on number of iterations in loop?

Question

D. Plotnick am 16 Okt. 2017

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/361608-speed-of-looped-operation-on-a-gpu-depending-on-number-of-iterations-in-loop

Kommentiert: Thomas Barrett am 9 Feb. 2021

This is a question that I think will get a bit into the weeds of MATLAB's JIT and GPU toolbox. I will be including a MWE sample code below, and it should be stated that I am using 2017a and have a Titan-X 12GB Pascal GPU.

The basic issue is this: I am performing a looped operation (e.g. an interpolation) on the GPU, and if the number of iterations in the loop is small, the operation is very fast. However, once the number of iterations passes some threshold, each operation slows way down (a factor of >100 in my case).

To illustrate this, I used my minimum working example (MWE) below. It produced on my machine these two figures.

The first shows the average time per numerical operation versus the number of iterations in the loop. At values n<200 the operations take on the order of 1E-4 s/op. After that threshold is passed, they take around 2E-2 s/op, a massive slowdown.

The second shows the total time for the loop. Again, we see a change in behavior where the number of iterations doesn't affect the total time (this is why I think its a JIT thing) until the threshold around n = 200, and then it increases linearly as expected.

Finally, for each loop I output the time spent on each individual operation. For 150 iterations, We see that the time/operation is fairly constant in the 1E-4 s range, but for 200 iterations there is a sudden massive change in the time partway through the loop.

The questions are:

(A) Why is this sudden change in speed occurring?
(B) Is there a way to code this so that it does not occur (pre-allocation didn't seem to work, nor variable clearing).
(C) If I cannot avoid it, can I predict it? In many cases I have the flexibility of changing the number of iterations in a loop through other means, so if keeping that number of iterations below some magic number will make my processing 400x faster, I will work on it.

My MWE code is below; it should be noted that this code shows this behavior on my machine, but it may not on yours. Also, the numerical operation being used here is a stand-in for an actual looped process and is just being used to illustrate the speed issue.

 % =========================================================================
 % MWE
 % =========================================================================
% Clean up
clear all
close all
clc
% Set up some demo data and interpolating spaces
times2 = cell(10,1);
times1 = zeros(10,1);
x = (1:4000).';
y = (1:240);
v = rand(240,4000);
xi = 4000*rand(500);
xi = repmat(xi,1,1,240);
[Mf,Nf,~] = size(xi);
yi = repmat(y,Mf,1,Nf);
yi = permute(yi,[1,3,2]);
% Put it all on the GPU
x = gpuArray(x);
y = gpuArray(y);
v = gpuArray(v);
xi = gpuArray(xi);
yi = gpuArray(yi);
% Outer loop - changes number of iteration used in inener loop
for nn = 1:10
    t1 = tic;
    nn
    timesIn = zeros(50*nn,1);
    % Inner loop, perform our interpolation n-times
    for ii  = 1:50*nn
        tI = tic;
        vi = interp2(x,y,v,xi,yi);
        vi = sum(vi,3);
        timesIn(ii) = toc(tI);
    end   
    % Plot the current time/op and save times
    figure(1)
    plot(timesIn); title(nn); drawnow;
    times1(nn) = toc(t1);
    times2{nn} = timesIn;
    toc(t1)
end
% Make Figures 
for  nn = 1:10
    mTimes(nn) = mean(times2{nn});
end
figure; plot((1:10)*50,mTimes); title('Mean time/operation'); ylabel('Time'); xlabel('n-Iterations');
figure; plot((1:10)*50,times1); title('Total Loop Time'); ylabel('Time'); xlabel('n-Iterations');

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Thomas Barrett am 9 Feb. 2021

Did you manage to figure this out in the end?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Joss Knight am 16 Okt. 2017

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/361608-speed-of-looped-operation-on-a-gpu-depending-on-number-of-iterations-in-loop#answer_286093

You're just doing the timing in an invalid way. Most GPU operations run asynchronously, so all you were timing for the first 100 or so iterations was the kernel launch time. Eventually, you filled the queue and no more kernels could be launched until running kernels had finished. So then you are actually timing the true cost. Use wait(gpuDevice) to synchronize the device before each call to tic or toc to ensure that the timing values make sense. Even better, use gputimeit to get more accurate timings for functional code.

2 Kommentare
Keine anzeigenKeine ausblenden

D. Plotnick am 17 Okt. 2017

Thanks as always Joss, and unfortunately this means I made an error in how I formed my MWE since in my actual code there is something odd happening with performance speed not related to the actual timing measurement. I'll have to come up with another, more appropriate MWE.

D. Plotnick am 19 Okt. 2017

Joss, I have revised a question posted here if you have a chance to look at it. I did not end up using gputimeit in that MWE, since I couldn't figure out a way to code it using anonymous functions not requiring an input.

Melden Sie sich an, um zu kommentieren.

Speed of looped operation on a GPU depending on number of iterations in loop?

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Speed of looped operation on a GPU depending on number of iterations in loop?

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden