# How to speed up Matrix Vector Multipliacation

31 views (last 30 days)
Wei HU on 25 Mar 2018
Commented: Jan on 26 Mar 2018
I found that for large matrix with matrix multiplication, Matlab will automatically parallel calculate it. ( CPU usage 100%) However, when doing Matrix multiply vector, it seems only one kernel is in use.
Can this operation be done parallelly? For example, I think we could simply divide the large matrix into many pieces according to row. I tried using parfor in Parallel Toolbox, which result in very low performance.
I also tried writing C function using CBLAS, however, as stated in, https://www.mathworks.com/matlabcentral/answers/390546-wrong-result-when-calling-cblas-dgemv-function-in-a-mex-file There are some problems in this implementation.
I wonder if there are other ways of accelerating this? Thanks!
I tried the following parfor code:
N = % num of rows
M = % num of cols
A = rand(N,M);
b = rand(M,1);
c = zeros(N,1);
p = 6;
batch = floor(N / 6);
parfor i = 1:p
istart = (i-1)*batch+1;
iend = i*batch;
if (i==p)
iend=N;
end
A_sliced = A(istart:iend,:);
c(istart:iend) = A_sliced * b;
end
John D'Errico on 25 Mar 2018
Edited: John D'Errico on 25 Mar 2018
First, don't bother with tic and toc. They are poor ways to time anything.
Next, you need to put it in a loop. If the multiply goes so fast that you cannot see the CPUS all coming alive, then you will never know it used them!
I could not even see the CPUs coming active until I wrapped a loop around it.
And as I said in my answer, hyperthreading does not count. You gain nothing of significance from splitting one CPU into two. If a CPU is fully active, splitting it so that you have two CPUS that are both fully active, but only half as capable is a waste of time. So you have 6 cores.
Hyper-threading is great for some applications. But not here.
Put it in a loop. If one multiply took .1 seconds, then do 100 multiplies. Then carefully watch a CPU monitor, as did I.

John D'Errico on 25 Mar 2018
Edited: John D'Errico on 25 Mar 2018
MATLAB automatically multi-threads computations where there will be a clear gain. For a matrix-vector multiply, it apparently does not see a gain, unless the problem is large enough. Remember that parallel computations are not always a speedup, because there is extra overhead. If you can farm it to the GPU directly, you might get a gain though. Since I lack that TB, I cannot help you there.
As a test though, I checked to see if MATLAB will multi-thread a matrix*vector operation.
A = rand(15000);B = rand(15000,1);
I stopped here, waiting for a few seconds until MATLAB went quiescent. To ensure that I was not seeing multithreading happen on the call to rand.
for i = 1:100
C = A*B;
end
The multiply was so fast to do, that I had to add a loop to allow me to see my CPU usage go up to the full 400% possible. So indeed MATLAB does multi-thread a matrix*vector multiply, if it sees a gain.
Make sure you have maxNumCompThreads set to the correct value for your CPU. For me:
ans =
4
Jan on 26 Mar 2018
+1. Exactly. A*b is multi-threaded already and you can see it, if you run it for a while by a loop. The task manager is not fast enough to see this for a single call only.