Hi, here is my gpu information:
CUDADevice with properties:
Name: 'Quadro M1000M'
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
So I would like to implement k-nearest neighbor using gpu. Here is the scope of my problem: for each of the 10000 query points, I need to find k-nearest neighbors among 1 million reference points with k <= 50. At first I thought this would not be difficult. However, when I checked an implementation here: https://github.com/vincentfpgarcia/kNN-CUDA , it says that the number of reference points should not be more than 65536 :( . Why is it so? And what happens if the points cannot all fit into memory? Would the advantage offered by GPU computing be lost then, since we would have to repeatedly transfer data back and forth between memory and the GPU? And what about kd-tree? I've heard that it is highly recursive and therefore not very suitable for gpu. So should i concentrate on brute-force? Please help me,thank you very much. (Currently, I'm using the Approximate Neighbor Search , inside a parfor loop, yeah its just as simple as that, and I have achieved some speedups, but I'm not sure if it would be as fast as I want it to be with 1 million reference points. And I need to do this k-nn search many many times, say 1000 times :( ).