GPU backslash performance much slower than CPU
Ältere Kommentare anzeigen
I am doing numerical power flow caclulation by modifying the functions of matpower, an open source toolbox. By modifying its function newtonpf.m, GPU computation can be implemented. However, I found that GPU performance is much much slower than CPU. When calculating the built-in case3012wp of matpower, the matrix in newtonpf.m will be :
A: 5725 * 5725 sparse double, b: 5725 * 1 double.
The process of A \ b in the 1st iteration of newtonpf() will generally take around 0.01 sec on my i7-10750H + RTX 2070super MSI-GL65.
But if A and b are changed into GPU arrays, the process of A \ b will take the following time if A is the following types:
full double, 0.8 sec
sparse double, 4 sec
full single, 0.1 sec
(sparse single is not supported)
So why is the diference in performance? I thought GPU could do things much faster than CPU.
Files are attached as follows. Atest is sparse and Agpu is a sparse gpu array. All are doubles.
9 Kommentare
Walter Roberson
am 27 Dez. 2020
A\b or A/b ??
Meme Young
am 27 Dez. 2020
What graphics card are you using? Easiest would be to show us the output of
>>gpuDevice
Also, I recommend attaching a .mat file containing A and b.
Meme Young
am 27 Dez. 2020
Matt J
am 27 Dez. 2020
I recommend attaching a .mat file containing A and b.
Meme Young
am 27 Dez. 2020
Matt J
am 27 Dez. 2020
Please attach all the variables in a single .mat file, to make the download more convenient.
kant
am 26 Mai 2022
I also have this problem for my matlab code? Has the problem been solved?
Antworten (1)
Matt J
am 27 Dez. 2020
0 Stimmen
This thread looks relevant. It appears that sparse mldivide on the GPU is not expected to be faster.
13 Kommentare
Meme Young
am 28 Dez. 2020
Matt J
am 28 Dez. 2020
What surprised me is that CPU sparse arrays are a lot faster than GPU full arrays.
There's no reason to think that GPU full arrays will be faster that CPU sparse arrays.
The condition number for your Atest matrix is quite poor:
>> cond(full(Atest))
ans =
2.1049e+06
When the condition number is better, I find that the advice provided at the link I gave you works quite favorably:
N=5725;
A=sprand(N,N,0.005);
A=A.'*A+speye(N);
b=rand(N,1);
Ag=gpuArray(A);
bg=gpuArray(b);
timeit(@()A\b) %0.8228 seconds
timeit(@()pcg(A,b,1e-6,1e3)) %0.2709 seconds
gputimeit(@()pcg(Ag,bg,1e-6,1e3)) %0.0538 seconds
Walter Roberson
am 28 Dez. 2020
Whether GPU of the full() would be faster than CPU of the sparse array, depends upon sparsity.
At one end, an empty sparse array is easily detected and CPU could finish it quickly. At the other end, a sparse array that is mostly filled in (sparse in name only) would be faster processed in full() on GPU.
At one end, an empty sparse array is easily detected and CPU could finish it quickly. At the other end, a sparse array that is mostly filled in (sparse in name only) would be faster processed in full() on GPU.
Yes, but why is the GPU so counterproductive when the matrix truly is sparse?
N=5725;
A=sprand(N,N,0.001);
A=A.'*A+speye(N);
b=rand(N,1);
Ag=gpuArray(A);
bg=gpuArray(b);
timeit(@()A\b) %0.3567 seconds
gputimeit(@()Ag\bg) %38 seconds
Meme Young
am 28 Dez. 2020
Meme Young
am 28 Dez. 2020
Meme Young
am 28 Dez. 2020
Joss Knight
am 29 Dez. 2020
We recommend the sparse solver algorithms with preconditioning for solving sparse systems on the GPU (and CPU in most cases), Direct solves using the backslash operator are generally inefficient to compute.
Meme Young
am 30 Dez. 2020
Joss Knight
am 10 Jan. 2021
Bearbeitet: Joss Knight
am 10 Jan. 2021
Yes, PCG, GMRES, CGS, LSQR, QMR, TFQMR, BICG, BICGSTAB. Try them all, play with tolerance, iterations and preconditioning - something is likely to work. I'm not an expert in this field but this is what the sparse community tend to do.
Kategorien
Mehr zu Linear Algebra finden Sie in Hilfe-Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!