For my research I have to perform a lot of repetition of the same optimization (for statistics). I already found out that my fitness function is way faster on the GPU and as such I am performing those calculations on the available GPUs. Fortunately, I have 3 GPUs at my disposal, I worked out a scheme where I open a parallel pool and using parfeval I assign each GPU to a different optimization.
When I checked the performance of the this setup, I noticed that the speed of a single GPU decreases a lot (by half) when it is used in the multiple GPU setup (3 workers) compared to a single GPU setup (1 worker).
I rechecked the implementations and saw no signs that data has to be sent from one GPU to the other so they never have to be synchronized.
Solutions I have tried: - Make a fitness function mfile for each GPU (did not work) - Open a matlab instance for each GPU separately (did not work)
Suggestions on this problem are appreciated?