I am doing particle simulations using matlab. So I have very large data sets which is the position of each particles, for example, [rx ry], where rx and ry are large column vecotrs containing the x and y coordinates of all the particles. So suppose there are 1e6 particles then the dimension of rx and ry both are 1e6 by 1. In the simulation I have to find the neighbor particles within a certain range R for each particles. Using CPU I wrote something like this using bruto force search
This function output a N_par by 1 cell array idx_Neighbor whose individual elements are the indices of the neigbor particels for every particles in the domain. This works fine using CPU, but it is very slow so I want to write a GPU version of the code by using arrayfun. But as far as I know, the GPU arrayfun requires the output array the same size as the input array, and it does not support cell array. Is there any way to get around this or there is no simple way other than write a cuda c code? Thanks!
p.s. I realize the kd tree rangesearch function, but this function does not support GPU. And I want to avoid transfering data between GPU and CPU since I implement other part of the simulation code also on GPU.
some following up, through some experimental on Matlab GPU coder, I managed to translate the following matlab code to cuda knernal (mex):
Numpar_neighbor=zeros(size(rx),'single'); %%% number of neighbor particles for each particle j
The cuda mex file runs fine, but it returns an ordinary array instead of gpuArray although the code runs on the GPU. So if I use the returned array Numpar_neighbor in the following part of the code I need to transfer it to the GPU, adding unnecessary overhead. Why is this and any way to force the mex file return gpuArray?