reduce memory for matrix multiplication

Hi dudes, I have 2 vectors multiplied by a 1e4*1e4 matrix as below example:
A = 1:1:10000;
B = rand(10000);
C = (1:1:10000)';
an I have a result matrix like D, each array of which are obtained by the multiplication of these matrices as below:
D(each array) = A*(B*C);
the size of D matrix is 11*11 or larger, meaning that I should do this multiplication 121 times or more consuming large memory and time. Any Idea how to optimize the multiplication process?

19 Kommentare

KALYAN ACHARJYA
KALYAN ACHARJYA am 22 Okt. 2018
Read here
Jan
Jan am 22 Okt. 2018
I do not understand the question. The multiplication (1x1e4)*(1e4x1e4)*(1e4x1) returns a scalar. So what does "each array" mean? Where does "11x11" come from? Why do 121 of these not really huge matrix multiplications consume too much memory or time? Please post the relevant part of the code and explain, how much time it needs and your requirements. Currently I guess, it is another part of your code, which slows down the processing.
Matt J
Matt J am 22 Okt. 2018
I have 2 vectors multiplied by a 4*4 matrix
4x4? Looks like 1e4 x 1e4 to me.
behzad
behzad am 22 Okt. 2018
Dear Jan, I should mention that the 3 matrices A,B, and C are variant and do not have constant values, but the sizes are constant. The values assigned to them were an example. You're right. The final answer of the 3 matrix multiplication results in a scalar. The given matrix 11*11 D matrix contains 121 elements. Each element is obtained by multiplication of the three matrices of A,B,and C which have different values in each multiplication process. It means 121 multiplication to calculate D. I am looking for a way to reduce the runtime.
behzad
behzad am 22 Okt. 2018
Dear Matt, I edited the original question. Thank you.
Matt J
Matt J am 22 Okt. 2018
Bearbeitet: Matt J am 22 Okt. 2018
So, if the elements of D are D(i,j), then there is a different B for every (i,j)? You have 121 different B's?
I used the profile command telling me which part of the code consumes more runtime and I realized that the multiplication process takes the most of the runtime. The code is part of a larger code, as you said, and it is written as a function, so it is hard to write it down here as an independent code. Therefore, I give an example looking more like my real code. I hope it helps. Thank you.
A = zeros(11,11); D = A;
for q = 1:200
for p = 1:1:121
A = rand(1,10000);
B = rand(10000);
C = rand(10000,1);
D(p) = A*(B*C);
end
A = D + A;
end
behzad
behzad am 22 Okt. 2018
Exactly, I have 121 different A, B and C. In other words, the values of A,B and C are different for each element of D (i.e. D(i,j)).
Bruno Luong
Bruno Luong am 22 Okt. 2018
Bearbeitet: Bruno Luong am 22 Okt. 2018
Without further assumptiions, there is nothing that can be reduced here obviously, the big part of memory is taken by the storage of B, and the time is taken by matrix x vector product. Those are "basic" arithmetic operations that MATLAB is good at.
James Tursa
James Tursa am 22 Okt. 2018
Bearbeitet: James Tursa am 22 Okt. 2018
Are all of your "different" A, B, and C in memory at the same time? Or are they generated within the loop?
I'm with Bruno ... MATLAB probably already does the A*(B*C) calculation itself in the best way possible. But multi-threading the loop (e.g. via a mex routine) might speed things up overall.
behzad
behzad am 22 Okt. 2018
Bearbeitet: behzad am 22 Okt. 2018
A & C are generated within the loop. But B is part of a bigger matrix. In each loop, part of the bigger matrix is introduced as B.
behzad
behzad am 22 Okt. 2018
Bearbeitet: behzad am 22 Okt. 2018
I have used mex already. But it still takes too much time. Due to what you said, I think I have optimized the code as much as possible.
James Tursa
James Tursa am 22 Okt. 2018
Bearbeitet: James Tursa am 22 Okt. 2018
"... B is part of a bigger matrix ..."
Are you extracting B from this bigger matrix each iteration within the loop? This requires a data copy at the m-code level but could potentially be avoided in a mex routine. Is it the same extraction each time? What are the details of this? (i.e., what is the size of this bigger matrix and which exact part are you extracting each iteration?)
behzad
behzad am 22 Okt. 2018
Bearbeitet: behzad am 22 Okt. 2018
Consider F as the bigger matrix. In each iteration I replace B by F(k1+1:k1+10000,k2+1:k2+10000). k1 and k2 are scalars defined before the loop starts and their values are as below:
0<=k1<=size(F,1)-10000,
0<=k2<=size(F,2)-10000
Bruno Luong
Bruno Luong am 22 Okt. 2018
This is a most pertinent information and MEX might be able to help you to not create B internal data at all and perform matrix vector product directly from F.
Matt J
Matt J am 22 Okt. 2018
Bearbeitet: Matt J am 22 Okt. 2018
Is it the same for A and C? Are they also extracted from some larger vectors Va,Vc according to
Va(k1+1:k1+10000);
Vc(k2+1:k2+10000);
behzad
behzad am 22 Okt. 2018
Unlike B, A and C are constructed within the loop and are not part of a bigger matrix.
behzad
behzad am 22 Okt. 2018
"... This is a most pertinent..."
Unfortunately I am not familiar with c programming enough. But I try to find out what is going on in the mex file.
James Tursa
James Tursa am 22 Okt. 2018
Bearbeitet: James Tursa am 22 Okt. 2018
So, would you way that your pseudo-code is really something like this:
A = zeros(11,11); D = A;
F = some large matrix (fixed value for the loop)
k1 = some integer (fixed value for the loop)
k2 = some integer (fixed value for the loop)
for q = 1:200
for p = 1:1:121
A = rand(1,10000); % some calculation that changes each time
B = F(k1+1:k1+10000,k2+1:k2+10000);
C = rand(10000,1); % some calculation that changes each time
D(p) = A*(B*C);
end
A = D + A;
end
I.e., B can obviously be pulled out of the loop in the above code and the data copy only done once. It is critical to know which things change during the loop and which things don't.
We can help you with the C mex stuff, but only if it really makes sense, and we won't know that until we know exactly how the pseudo code looks.

Melden Sie sich an, um zu kommentieren.

Antworten (2)

Matt J
Matt J am 22 Okt. 2018
Bearbeitet: Matt J am 22 Okt. 2018

0 Stimmen

One possibility might be to zero-pad A and C, embedding them in larger sparse vectors to be multiplied with F. This way you don't have to pull B out of F and allocate separate memory for it.
Za=spalloc(1,size(F,1), 1e4);
Zc=spalloc(size(F,2),1,1e4);
for p = 1:121
A = ...
Apad=Za;
Apad(k1+1:k1+1e4)=A;
C=...
Cpad=Zc;
Cpad(k2+1:k2+1e4)=C;
D(p) = Apad*(F*Cpad);
end

9 Kommentare

Bruno Luong
Bruno Luong am 22 Okt. 2018
Bearbeitet: Bruno Luong am 22 Okt. 2018
I like it, sound clever. Though I'm not sure about what is the advantage of using sparse vector.
Matt J
Matt J am 22 Okt. 2018
Bearbeitet: Matt J am 22 Okt. 2018
The advantage is that rows/columns not meant to be in B will not participate in the matrix/vector multiplications. However, I acknowledge that unless F is significantly larger than B, it may not gain much.
Bruno Luong
Bruno Luong am 22 Okt. 2018
Bearbeitet: Bruno Luong am 22 Okt. 2018
And it might be worse because the BLAS for full matrix is replaced by a less-efficient sparse algorithm. But it just a guess.
EDIT: I tested both with tic/toc, for the same size sparse is twice slower than full.
Matt J
Matt J am 22 Okt. 2018
With how much padding?
Bruno Luong
Bruno Luong am 22 Okt. 2018
zero.
Matt J
Matt J am 23 Okt. 2018
Bearbeitet: Matt J am 23 Okt. 2018
I think it depends a lot on F.
N=3e4+1;
F=eye(N);
x=ones(1e4,1); x(N)=0;
xs=sparse(x);
tic; %full
ans=x.'*(F*x)+7;
toc;
%Elapsed time is 2.279681 seconds.
tic; %sparse
ans=xs.'*(F*xs)+7;
toc;
%Elapsed time is 0.274425 seconds.
Bruno Luong
Bruno Luong am 23 Okt. 2018
Bearbeitet: Bruno Luong am 23 Okt. 2018
Mainly because I guess you set F is EYE, not realistic. Here is my test results
size(F)=10011
FULL : 0.01708 seconds
SPARSE : 0.07347 seconds
INPLACE: 0.04690 seconds
size(F)=30000
FULL : 0.15706 seconds
SPARSE : 0.26915 seconds
INPLACE: 0.04871 seconds
Code:
N=3e4;
F=rand(N);
A = rand(1,1e4);
C = rand(1e4,1);
[m,n] = size(F);
Ae = A;
Ae(m) = 0;
Aes=sparse(Ae);
Ce = C;
Ce(n) = 0;
Ces=sparse(Ce);
tic; %full
D=Ae*(F*Ce);
t1=toc;
tic; %sparse
D = Aes*(F*Ces);
t2=toc;
tic % inplace
k1 = 1;
k2 = 1;
AFk = zeros(1,size(C,1));
for k=1:length(AFk)
Fk = mxCreateSharedMatrix2018(F,k1+(k2+k-1)*m,size(A,2),1);
AFk(k) = A*Fk;
end
mxUnshareMatrix2018(Fk,[],1);
clear Fk
D = AFk*C; % 0.048428 seconds.
t3=toc;
fprintf('size(F)=%d\n', N);
fprintf('\tFULL : %1.5f seconds\n', t1);
fprintf('\tSPARSE : %1.5f seconds\n', t2);
fprintf('\tINPLACE: %1.5f seconds\n', t3);
OK, how about this:
N=4e4+1;
F=ones(N);
x=ones(1e4,1); x(N)=0;
xs=sparse(x);
tic; %full
ans=x.'*(F*x)+7;
toc;
%Elapsed time is 0.428139 seconds.
tic; %sparse
ans=xs.'*(F*xs)+7;
toc;
%Elapsed time is 0.374981 seconds.
The result with my PC on your code
FULL: Elapsed time is 0.278291 seconds.
SPARSE: Elapsed time is 0.360029 seconds.
Of course you can try to increase N where at the limit become favorable more and more to sparse.

Melden Sie sich an, um zu kommentieren.

Bruno Luong
Bruno Luong am 22 Okt. 2018
Bearbeitet: Bruno Luong am 24 Okt. 2018

0 Stimmen

If you are using R2018a/b you can use the MEX file attached here to avoid forming B. The matrix-vector product is replaced by a loop on vector x vector, it can be slower than MATLAB (twice according to my test).
[m,n] = size(F);
AFk = zeros(1,size(C,1));
for k=1:length(AFk)
Fk = mxCreateSharedMatrix2018(F,k1+(k2+k-1)*m,size(A,2),1);
AFk(k) = A*Fk;
end
mxUnshareMatrix2018(Fk,[],1);
clear Fk
D = AFk*C; % or set D(k1,k2)?

2 Kommentare

Jan
Jan am 24 Okt. 2018
@Bruno: The link is dead.
Bruno Luong
Bruno Luong am 24 Okt. 2018
OK Jan, I attached the code directly

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Loops and Conditional Statements finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 22 Okt. 2018

Kommentiert:

am 24 Okt. 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by