MATLAB Answers

0

Reduction variables on the GPU II and arrayfun: cannot assign to parent function variable?

Asked by D. Plotnick on 3 Dec 2018
Latest activity Answered by Joss Knight
on 4 Dec 2018
Hello,
This is (hopefully) a simple reduction variable question for performing parallel GPU operations onto a single value. I have read the tutorial on stencil processing and frankly do not understand why this does not work.
A simple example is below (not intended to be actually used, stand in for more complicated operations). Here, I am taking some vector array, using a gpu arrayfun to get the difference between neighbors, and then trying to sum those differences to a single variable. Since the difference operation is order independent, and the result is summed onto a single variable, I figured a comination of arrayfun + a reduction variable using nested functions would be the best way to start.
function v = reductionVariableLoopTest()
x = gpuArray.rand(100,1);
v = gpuArray.zeros(1);
function d = difFun(ind)
d = x(ind+1) - x(ind);
end
function sumFun(ind)
v = v + difFun(ind);
end
vect = gpuArray.colon(1,length(x)-1);
arrayfun(@sumFun,vect);
end
However, this gives the error: Assignment of parent function variable(s): 'v' by 'sumFun' is not allowed.
Now, I know I could get around this by simply using
y = arrayfun(difFun,vect);
v = sum(y);
but this misses the whole point of using a reduction variable. The order independent on-gpu difFun should be extremely fast, and the use of the shared variable v should be both fast and memory friendly.
Any thoughts?
Cheers,
-Dan

  0 Comments

Sign in to comment.

1 Answer

Answer by Joss Knight
on 4 Dec 2018
 Accepted Answer

No, you can only read from uplevel variables, and then only one element at a time. You cannot write to them. That is not the intention of GPU arrayfun.
Generally we encourage you to vectorize your code and use MATLAB's own element-wise, reduction and accumulation operations. They are well optimized. If you need to go further than that then you're into the realm of writing your own kernels in CUDA C++. Alternatively, GPU Coder can be used to attempt to combine loops to form kernels that combine various parallel algorithms.

  0 Comments

Sign in to comment.