GPU training of neural network with parallel computing toolbox unreasonably slow, what am I missing?
Ältere Kommentare anzeigen
I’m trying to speed up the training of some NARNET neural networks by using the GPU support that you get from the parallel computing toolbox but so far I haven’t been getting it to work. Or rather, it is working but it’s unreasonably slow. According to the documentation training on a GPU instead of the CPU shouldn’t be any harder than adding the statement 'useGPU','yes” to the training command. However, if I simply create some dummy data, for example a sine wave with 900 values, and train a NARNET on it using the CPU like so:
%CPU training
T = num2cell(sin(1:0.01:10));
net = narnet( 1:2, 10 );
[ Xs, Xsi, Asi, Ts] = preparets( net, {}, {}, T );
rng(0)
net.trainFcn = 'trainscg';
tic
net = train(net,Xs,Ts,'showResources','yes' );
toc %2.77
The training takes less than 3 seconds. But when doing the exact same thing on a CUDA supported GTX 760 GPU:
%GPU training
T = num2cell(sin(1:0.01:10));
net = narnet( 1:2, 10 );
[ Xs, Xsi, Asi, Ts] = preparets( net, {}, {}, T );
rng(0)
net.trainFcn = 'trainscg';
tic
net = train(net,Xs,Ts,'useGPU','yes','showResources','yes' );
toc % 1247.6
Incredibly the training takes over 20 minutes!
I’ve read through Mathworks fairly extensive documentation on parallel and GPU computing with the neural network toolbox ( link here ) and seen that there are a few things that can/should be done when calculating with a GPU for example converting the input and target data to GPU arrays before training with the nndata2gpu command and replacing any tansig activation functions with elliotsig which does speed up the training a bit:
%Improved GPU training
T = num2cell(sin(1:0.01:10));
net = narnet( 1:2, 10 );
[ Xs, Xsi, Asi, Ts ] = preparets( net, {}, {}, T );
rng(0)
net = configure(net,Xs,Ts);
Xs = nndata2gpu(Xs);
Ts = nndata2gpu(Ts);
Xsi = nndata2gpu(Xsi);
for i=1:net.numLayers
if strcmp(net.layers{i}.transferFcn,'tansig')
net.layers{i}.transferFcn = 'elliotsig';
end
end
net.trainFcn = 'trainscg';
tic
net = train(net,Xs,Ts,'showResources','yes' );
toc %70.79
The training here only takes about 70 seconds, but still it’s many times slower compared to just doing it on my CPU. I’ve tried several different sized data series and network architectures but I’ve never seen the GPU training being able to compete with the CPU which is strange since as I understand it most professional ANN research is done using GPU’s?
What am I doing wrong here? Clearly I must be missing something fundamental.
Thanks
1 Kommentar
Greg Heath
am 10 Jul. 2015
Bearbeitet: Greg Heath
am 10 Jul. 2015
You don't need an if statement to replace tansig by elliotsig. Just replace it right after you define the net.
My elliot4sig is a little faster
elliot4sig(x) = x/(0.25 + abs(x))
Greg
Akzeptierte Antwort
Weitere Antworten (1)
Adam Hug
am 2 Jul. 2015
0 Stimmen
I suspect the problem size of 900 values may be too small for you to benefit from GPU architecture. Especially since you can easily fit 900 values into a CPU cache. The problem sizes need to be much larger for the communication between the CPU and GPU to be small in comparison to the computation. Try a sine wave with one million values and see if the GPU outperforms the CPU.
4 Kommentare
Joss Knight
am 9 Jul. 2015
Your GTX 760's double-precision performance doesn't look too great.
Make sure your data is all single precision, then try again, e.g.
Xs = single(Xs);
Ts = single(Ts);
Amanjit Dulai
am 14 Jul. 2015
You should be able to convert the data to single precision with nndata2gpu as follows:
Xs = nndata2gpu(Xs,'single');
Kategorien
Mehr zu Parallel and Cloud finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!