Hi asim,
I understand that you want to decrease the runtime of your script by shifting the computation to GPU.
The GPU code's suboptimal performance can be attributed to the communication overhead between the CPU and GPU. It will take considerable time to pass the operation to the GPU. To enhance performance, it is advisable to consider reducing the total number of operations and simultaneously increasing the number of elements on which the given operation is executed.
For example, refer to the below code snippet,
G(i,j)= (exp(1i*(kg*r(i,j)*nb)))/(4*pi*r(i,j));
Here
operations are being given to GPU. Instead, we can give one single operation as shown below to GPU and achieve the same results. G = (exp(1i*(kg*r*nb)))./(4*pi*r)
Similarly, the following code is giving
number of operations to GPU. g=zeros(length(X),length(Y));
Instead, the g array can be computed in CPU and then the result can be transferred to GPU as shown below.
g=zeros(length(X),length(Y));
After making these changes I can see
speedup in calculation of G, for the
. For larger values of N, the speedup will increase even further.
If you have multiple operations that need to be applied to all elements of a given matrix, and you intend to offload them to the GPU, you can consolidate these operations into a single function using the 'arrayfun' function.
G2 = arrayfun(@some_fun,G2,r,kg,nb);
function G = some_fun(G,r,kg,nb)
G= (exp(1i*(kg*r*nb)))./(4*pi*r);
For more information refer the following resources,
- https://www.mathworks.com/help/parallel-computing/gpuarray.html
- https://www.mathworks.com/help/parallel-computing/run-matlab-functions-on-a-gpu.html
- https://www.mathworks.com/help/parallel-computing/illustrating-three-approaches-to-gpu-computing-the-mandelbrot-set.html#:~:text=naiveGPUTime%20%3D%200.2181-,Element%2Dwise%20Operation,-Noting%20that%20the
- https://www.mathworks.com/help/parallel-computing/work-with-complex-numbers-on-a-gpu.html
Hope this is helpful.