MATLAB GPU: arrayfun with indexing

Hi
I am new to MATLAB GPU computing and have made some initial tests. Now I am looking to parallelize a the following code.
for i=1:n ;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(24,i);
a(currindices ) = a(currindices ) + A(24x24)*(b(currindices )+B(24x24)*c(currindices ));
end
In a test I parallelized this code without any of the indices by using arrayfun and it worked well. Meaning just having the following code in an function that was called by arrayfun:
for i=1:n
a=a+A*(b+B*c)
end
I wonder how to deal with the indexing of the vectors and whether arrayfun still makes sense. The matrices A and B are constant. I read that indexing is rather slow on a GPU.
What would be the best way to parallelize the above code?
Thanks for any help. This whole paralellization does not come natural to me yet.
BR

6 Kommentare

Walter Roberson
Walter Roberson am 22 Okt. 2017
Bearbeitet: Walter Roberson am 24 Okt. 2017
? currindices appears to be unused before you assign to it.
Markus Ess
Markus Ess am 22 Okt. 2017
sorry, was a mistake. indexing should happen to currindices. fixed the code in the sample
Joss Knight
Joss Knight am 24 Okt. 2017
I'm not sure what language you've written your code in so it's difficult to interpret. What is A(24x24)? And if this were MATLAB code then indices(24,i) would be a scalar. But then your algebra doesn't make sense.
Markus Ess
Markus Ess am 24 Okt. 2017
Bearbeitet: Walter Roberson am 24 Okt. 2017
it wasn't meant to be real code. it is just to show that A is of size 24x24 and that for currindices I read 24 values. so currindices is currindices(:,i) in MATLAB code and the multiplication with A and B is simply that.
for i=1:n %;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(:,i);
a(currindices ) = a(currindices ) + A*(b(currindices )+B*c(currindices ));
end
well, one of the things I learnt anyway is that I have to use pagefun. the problem is still indexing.
however the main feeling i have is that anyway I have to rewrite the math for an optimal parallelization.
I don't think you need pagefun. Can't you just do this with indexing and matrix multiplication? It seems indices is the correct shape, namely 24-by-n. So b(indices) and c(indices) return 24-by-n, the multiplies return 24-by-n, and the addition works.
a(indices) = a(indices) + A * (b(indices) + B * c(indices));
If the indices repeat this may not work as you intended, because some elements of a will get one of the answers and not another. You might have to use accumarray in that case.
result = a(indices) + A * (b(indices) + B * c(indices));
a = accumarray(result, indices(:), size(a));
Markus Ess
Markus Ess am 31 Okt. 2017
got it. at least on CPU the multiplication is 10 times faster than the for loop. anyway I know need to rewrite the code and see how that could work on a GPU.
thanks!

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Kategorien

Gefragt:

am 22 Okt. 2017

Kommentiert:

am 31 Okt. 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by