Asked by D. Plotnick
on 16 Oct 2017

This is a question that I think will get a bit into the weeds of MATLAB's JIT and GPU toolbox. I will be including a MWE sample code below, and it should be stated that I am using 2017a and have a Titan-X 12GB Pascal GPU.

The basic issue is this: I am performing a looped operation (e.g. an interpolation) on the GPU, and if the number of iterations in the loop is small, the operation is very fast. However, once the number of iterations passes some threshold, each operation slows way down (a factor of >100 in my case).

To illustrate this, I used my minimum working example (MWE) below. It produced on my machine these two figures.

The first shows the average time per numerical operation versus the number of iterations in the loop. At values n<200 the operations take on the order of 1E-4 s/op. After that threshold is passed, they take around 2E-2 s/op, a massive slowdown.

The second shows the total time for the loop. Again, we see a change in behavior where the number of iterations doesn't affect the total time (this is why I think its a JIT thing) until the threshold around n = 200, and then it increases linearly as expected.

Finally, for each loop I output the time spent on each individual operation. For 150 iterations, We see that the time/operation is fairly constant in the 1E-4 s range, but for 200 iterations there is a sudden massive change in the time partway through the loop.

The questions are:

- (A) Why is this sudden change in speed occurring?
- (B) Is there a way to code this so that it does not occur (pre-allocation didn't seem to work, nor variable clearing).
- (C) If I cannot avoid it, can I predict it? In many cases I have the flexibility of changing the number of iterations in a loop through other means, so if keeping that number of iterations below some magic number will make my processing 400x faster, I will work on it.

My MWE code is below; it should be noted that this code shows this behavior on my machine, but it may not on yours. Also, the numerical operation being used here is a stand-in for an actual looped process and is just being used to illustrate the speed issue.

% =========================================================================

% MWE

% =========================================================================

% Clean up

clear all

close all

clc

% Set up some demo data and interpolating spaces

times2 = cell(10,1);

times1 = zeros(10,1);

x = (1:4000).';

y = (1:240);

v = rand(240,4000);

xi = 4000*rand(500);

xi = repmat(xi,1,1,240);

[Mf,Nf,~] = size(xi);

yi = repmat(y,Mf,1,Nf);

yi = permute(yi,[1,3,2]);

% Put it all on the GPU

x = gpuArray(x);

y = gpuArray(y);

v = gpuArray(v);

xi = gpuArray(xi);

yi = gpuArray(yi);

% Outer loop - changes number of iteration used in inener loop

for nn = 1:10

t1 = tic;

nn

timesIn = zeros(50*nn,1);

% Inner loop, perform our interpolation n-times

for ii = 1:50*nn

tI = tic;

vi = interp2(x,y,v,xi,yi);

vi = sum(vi,3);

timesIn(ii) = toc(tI);

end

% Plot the current time/op and save times

figure(1)

plot(timesIn); title(nn); drawnow;

times1(nn) = toc(t1);

times2{nn} = timesIn;

toc(t1)

end

% Make Figures

for nn = 1:10

mTimes(nn) = mean(times2{nn});

end

figure; plot((1:10)*50,mTimes); title('Mean time/operation'); ylabel('Time'); xlabel('n-Iterations');

figure; plot((1:10)*50,times1); title('Total Loop Time'); ylabel('Time'); xlabel('n-Iterations');

Answer by Joss Knight
on 16 Oct 2017

Accepted Answer

D. Plotnick
on 17 Oct 2017

D. Plotnick
on 19 Oct 2017

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.