Efficient training of LSTM network with GPU
Ältere Kommentare anzeigen
Hi all,
I recently introduced a GPU implemented computer and currently trying to refactor my LSTM codes to take advantage of GPU. However, I found my implementation doesn't show improvement on speed, actually using CPU is faster than using GPU. Below testing codes are testing of basic algorithm of LSTM for comparison. Could anyone give some advice on how to employ the potential of GPU for LSTM? I tried using pagefun, arrayfun and bsxfun but they seemed not working to improve speed.
This one is for GPU.
function LSTM_gpu2()
vis = 700; hid = 500;
T = 80; epochs = 10;
sigmoid = @(x) 1./(1+exp(-x));
x = rand(vis,1,T); h = zeros(hid,1,T+1); c = h;
W_z = rand(hid,vis,'gpuArray'); W_i = rand(hid,vis,'gpuArray');
W_f = rand(hid,vis,'gpuArray'); W_o = rand(hid,vis,'gpuArray');
R_z = rand(hid,hid,'gpuArray'); R_i = rand(hid,hid,'gpuArray');
R_f = rand(hid,hid,'gpuArray'); R_o = rand(hid,hid,'gpuArray');
P_i = diag(rand(hid,1,'gpuArray')); P_f = diag(rand(hid,1,'gpuArray'));
P_o = diag(rand(hid,1,'gpuArray'));
b_z = rand(hid,1,'gpuArray'); b_i = rand(hid,1,'gpuArray');
b_f = rand(hid,1,'gpuArray'); b_o = rand(hid,1,'gpuArray');
I = zeros(hid,T,'gpuArray'); F = zeros(hid,T,'gpuArray');
O = zeros(hid,T,'gpuArray'); G = zeros(hid,T,'gpuArray');
x = gpuArray(x); h = gpuArray(h); c = gpuArray(c);
tic;
for i=1:epochs
for t=1:T
G(:,t) = tanh(W_z*x(:,:,t) + R_z*h(:,:,t) + b_z);
I(:,t) = sigmoid(W_i*x(:,:,t) + R_i*h(:,:,t) + P_i*c(:,:,t) + b_i);
F(:,t) = sigmoid(W_f*x(:,:,t) + R_f*h(:,:,t) + P_f*c(:,:,t) + b_f);
c(:,:,t+1) = G(:,t).*I(:,t) + c(:,:,t).*F(:,t);
O(:,t) = sigmoid(W_o*x(:,:,t) + R_o*h(:,:,t) + P_o*c(:,:,t+1) + b_o);
h(:,:,t+1) = tanh(c(:,:,t+1)).*O(:,t);
end
%%backprop
%%update
end
toc;
return;
And this one is for CPU.
function LSTM_cpu()
vis = 700; hid = 500;
T = 80; epochs = 10;
sigmoid = @(x) 1./(1+exp(-x));
x = rand(vis,1,T); h = zeros(hid,1,T+1); c = h;
W_z = rand(hid,vis); W_i = rand(hid,vis);
W_f = rand(hid,vis); W_o = rand(hid,vis);
R_z = rand(hid,hid); R_i = rand(hid,hid);
R_f = rand(hid,hid); R_o = rand(hid,hid);
P_i = diag(rand(hid,1)); P_f = diag(rand(hid,1));
P_o = diag(rand(hid,1));
b_z = rand(hid,1); b_i = rand(hid,1);
b_f = rand(hid,1); b_o = rand(hid,1);
I = zeros(hid,T); F = zeros(hid,T);
O = zeros(hid,T); G = zeros(hid,T);
tic;
for i=1:epochs
for t=1:T
G(:,t) = tanh(W_z*x(:,:,t) + R_z*h(:,:,t) + b_z);
I(:,t) = sigmoid(W_i*x(:,:,t) + R_i*h(:,:,t) + P_i*c(:,:,t) + b_i);
F(:,t) = sigmoid(W_f*x(:,:,t) + R_f*h(:,:,t) + P_f*c(:,:,t) + b_f);
c(:,:,t+1) = G(:,t).*I(:,t) + c(:,:,t).*F(:,t);
O(:,t) = sigmoid(W_o*x(:,:,t) + R_o*h(:,:,t) + P_o*c(:,:,t+1) + b_o);
h(:,:,t+1) = tanh(c(:,:,t+1)).*O(:,t);
end
%%backprop
%%update
end
toc;
return;
OS: Windows 10,
GPU: NVIDIA Quadro M5000,
CPU: Intel i7-5820K,
MATLAB: R2016a
Thank you,
Yuto Ozaki
1 Kommentar
Yuto Ozaki
am 10 Apr. 2016
Bearbeitet: Yuto Ozaki
am 10 Apr. 2016
Akzeptierte Antwort
Weitere Antworten (0)
Kategorien
Mehr zu Deep Learning Toolbox finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!