I have recently come across a problem when using a Cuda kernel: The results I get from the kernel are not always the same even though the input data is always the same.
Note the difference in the third row second column.
This is the Matlab code from the function:
And the Kernel:
Does Matlab perhaps not work well cuDoubleComplex when used inside the kernel?
I also tried to use wait(gpuDevice(1)) after I gathered the data. I thought that perhaps the gpu was not done with the calculation. It did not work.
Can somebody help me?