2 views (last 30 days)
Oscar Martinez on 9 Apr 2021
Commented: Walter Roberson on 12 Apr 2021
I want to run a code that,
1) At each iteration loads a .mat (containing an image)
2) Then convert the arrary into a gpuArray and apply some function.
somthing like this:
total=100;
tic
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Prueba Datos';
B=A.roiwide;
C=B^4;
end
t_forCPU=toc
tic
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Prueba Datos';
B=A.roiwide;
B=gpuArray(B);
wait(gpuDevice)
C=B^4;
end
t_forGPU=toc
why if I replase for by parfor is slower?
why converting to gpuArray takes more tieme? that is: t_forGPU is slower than t_forCPU
Walter Roberson on 9 Apr 2021
wait(gpuDevice)
C=B^4;
should probably be
C=B^4;
gather(C);

Walter Roberson on 9 Apr 2021
You need to synchronize with the GPU, and spend time transfering data to it, and wait for it to be ready, and read the data back. Those all take time.
In order to have a gain of speed, the time the GPU would spend doing the operation must be less than the time the CPU would do the operation by enough to make up for the overheads.
If you do not send the GPU a big enough chunk of work, the overhead is going to be too costly.
C=B^4;
It is not clear whether B is a scalar, or is a square matrix? If it is a scalar, then you can be certain that the overheads are much higher than the performance gain from using the GPU.If it is a square matrix, then whether it is a gain or not is going to depend on the size.
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Prueba Datos';
Remember that you are timing the load() as well as the GPU. To be more fair in comparing the two, you should loop loading the data into a cell array first, before doing the timing, and then time only the computational loop.

Oscar Martinez on 12 Apr 2021
Thank you very much Walter
Maybe part of the problem is that I’m using Matlab 2017a. The problem is that I have bought Matlab 2021 with the parallel computing toolbox and due to delays in the payment from my Institution I received meanwhile a free trial but by mistake the parallel computing toolbox was not included so maybe some of the problems I encounter are corrected in this new version I didn't receive yet. To gain time I was testing that toolbox with an old version on the computer of a colleague.
B here is a matrix of 150x150. When you mentioned to loading the data into a cell, Do you mean this?
total=100;
CG=gpuArray(cell(1,total));
tic
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Data';
B=A.roiwide;
B=gpuArray(B);
CG{Iteraciones}=B;
%C=B^4;
end
t_forGPU=toc
I have problems with first line:
Error using gpuArray
GPU arrays support only fundamental numeric or logical data types.
Walter Roberson on 12 Apr 2021
There are potential trade-offs about when you gather and when you create the arrays that are worth testing.
Some of the below approaches have the risk of exceeding GPU memory, so I included code to remove the entries from the GPU in the timing (as that is overhead you need to take into account)
total=100;
GC = cell(total,1);
for Iteraciones = 1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Data';
B=A.roiwide;
CG{Iteraciones}=B;
end
%how long by CPU?
C = cell(total,1);
tic
for Iteractions = 1 : total
C{Iteractions} = CG{Interactions}^4;
end
time_by_cpu = toc;
%common GPU processing: do operations on data and gather() the results
C = cell(total,1);
tic
for Iteractions = 1 : total
B = gpuarray(GC{Interactions});
C{Iteractions} = gather(B^4);
end
clear B
time_by_gpu1 = toc;
%potential GPU processing: postpone the gather()
B = cell(total,1);
tic
for Iteractions = 1 : total
B{Interactions} = gpuarray(GC{Interactions})^4;
end
C = cell(total,1);
for Iteractions = 1 : total
C{Interactions} = gather(B{Interactions});
end
clear B
time_by_gpu2 = toc;
%potential GPU processing: create all the GPU arrays first
B = cell(total,1);
tic
for Iteractions = 1 : total
B{Interactions} = gpuarray(GC{Interactions});
end
C = cell(total,1);
for Iteractions = 1 : total
C{Interactions} = B{Interactions}^4;
end
D = cell(total,1);
for Interactions = 1 : total
D{Interactions} = gather(C{Interactions});
end
clear B C
time_by_gpu3 = toc;
%potential GPU processing: gather all at the end
B = cell(total,1);
tic
for Iteractions = 1 : total
B{Interactions} = gpuarray(GC{Interactions})^4;
end
C = cell(total,1);
[C{:}] = gather(B{:});
clear B
time_by_gpu4 = toc;