Why is the timing for decomposition and backsolve for M and M' so different?

Question: Why is the timing for doing lu decomposition and back solve so different for M versus M'?
Background: I have a non-symmetric 377544 x 377544 sparse real valued matrix with the sparsity pattern shown in the spy plot below.
FM = decomposition(M);
Takes about 36 seconds, but
FadjM = decomposition(M');
Takes about 42 minutes!
On the other hand for F a 377544 x 12 real dense matrix,
test = FM \ F;
Takes about 23 seconds, but
test = FM' \ F;
Takes only 8.5 seconds.
If I try to decompose the transpose of M and then test the backsolves I find that
test = FadjM \ F;
takes about 300 seconds, while
test = FadjM' \ F;
also takes about 300 seconds.
Does anyone have any insights into what is going on? If anyone is interested in investigating more deeply, I'm happy to upload M.
Thank you,
Francois

12 Kommentare

Is there a particular reason why you are using the complex conjugate transpose of a real-valued matrix?

This particular matrix is real, but I have a sequence of them from a Fourier decomposition in the time domain and the other matrices are complex. All the matrices, whether I’m dealing with the DC component or not have the same sparsity pattern and the same strange timings with the conjugate transpose.

Also, the timing for the complex valued matrices is about twice that of the real valued M, but otherwise the relative difference with conjugate transpose or not is the same.

If anyone is interested in investigating more deeply, I'm happy to upload M.
I think nobody will be able to give a useful answer to your question without the matrix M. So if the size of the matrix is not too large for "Answers", you should include it here as a .mat - file.
Unfortunately, the file is too big for me to upload.
However, if I use
tic
[L, U, P, Q, R] = lu(M);
toc
Elapsed time is 49.411794 seconds
and
tic
[L, U, P, Q, R] = lu(M');
toc
Elapsed time is 121.903849 seconds.
The timming difference is much smaller (only a factor of two). And then the subsequent backsolves take ~13 seconds regardless of whether or not I did lu(M) or lu(M') and then whether or not I applied
result = Q *( U \ ( L \ P * ( R \ F ) ) ) );
or
result = R' \ ( P' * (L' \ ( U' \ (Q' * F) ) ) );
So I suspect the prolem is with the default choice of LUPivotTolerance used by decomposition. I'm going to try changing this in lu(M,thresh) to see if I can reproduce the effect I get with decomposition.
For the curious, here are the spy plots for L, U, Q, and P for the case of lu(M'). They look very similar for lu(M).
Francois
Francois am 9 Nov. 2025
Bearbeitet: Francois am 9 Nov. 2025
The MATLAB documentation seems to suggest that the default thres values, [0.1 0.001], are the same for lu(M) and decomposition(M). Moreover, changing those parameters affects the timing only modestly. So the issue appears to be with decomposition and not with Matlab's lu function.
I guess I'm going to switch back to using lu because the overall cost of the forward and adjoint backsolves is approximately 20% faster with lu compared to decomposition. I'm still curious about what's happening under the hood with decomposition though.
Can you generate an M and F with reduced size but the same structure that still illustrates the problem but can be uploaded?
Francois
Francois am 9 Nov. 2025
Bearbeitet: Francois am 9 Nov. 2025
I managed to create a smaller version (attached above) that exhibits much of the same behavior if not as dramatic as with the original matrix.
1) FM = decomposition(M) is orders of magnitude faster than FadjM = decomposition(M')
2) backsolve with FM takes almost three times longer than backsolve with FM'
3) backsolves with FadjM and FadjM' take roughly the same amount of time, and take about 10 times longer than backsolves with FM'.
tic; FM = decomposition(M); toc
Elapsed time is 0.734764 seconds
tic; FadjM = decomposition(M'); toc
Elapsed time is 12.072036 seconds
tic; test = FM \ F; toc
Elapsed time is 0.830504 seconds
tic; test = FM' \ F; toc
Elapsed time is 0.308504 seconds
tic; test = FadjM \ F; toc
Elapsed time is 2.920533 seconds.
tic; test = FadjM' \ F; toc
Elapsed time is 3.483247 seconds.
Running here to verify similar behavior on Answers and that both decompositions are of the same type.
load MandF
tic; FM = decomposition(M); toc, FM.Type
Elapsed time is 0.685958 seconds.
ans = 'lu'
tic; FadjM = decomposition(M'); toc, FadjM.Type
Elapsed time is 12.832422 seconds.
ans = 'lu'
tic,issparse(M');toc % just to make sure nothing nefarious when transposing
Elapsed time is 0.004498 seconds.
tic; test = FM \ F; toc
Elapsed time is 0.093494 seconds.
tic; test = FM' \ F; toc
Elapsed time is 0.031214 seconds.
tic; test = FadjM \ F; toc
Elapsed time is 0.587851 seconds.
tic; test = FadjM' \ F; toc
Elapsed time is 0.558215 seconds.
Even though FadjM uses lu, maybe the structure of M' makes the the algorithm for selecting lu take longer to get there.
Force lu
tic,Foo = decomposition(M','lu');toc
Elapsed time is 13.089970 seconds.
So that's not it.
Not an answer why decomposing A' takes so much longer, but using that if A = L*U, then A' = U' * L', you can use the LU factorization for A to get one for A':
A = rand(3)+1i*rand(3);
[L1,U1] = lu(A);
A'- U1'*L1'
ans =
1.0e-15 * 0.0000 + 0.0000i 0.0000 + 0.0555i 0.0000 + 0.0000i 0.0000 + 0.1110i -0.0555 - 0.0555i 0.0000 + 0.0000i 0.0000 + 0.1110i 0.0000 + 0.0000i 0.0000 + 0.0000i
A - L1*U1
ans =
1.0e-15 * 0.0000 + 0.0000i 0.0000 - 0.1110i 0.0000 - 0.1110i 0.0000 - 0.0555i -0.0555 + 0.0555i 0.0000 + 0.0000i 0.0000 + 0.0000i 0.0000 + 0.0000i 0.0000 + 0.0000i

Thank you for the response. As I mentioned above, if I use [L, U, P, Q, R] = lu(M); or lu(M’) The difference in timing on my original matrix is less than a factor of 3, but decomposition(M’) takes about 70 times longer than decomposition(M). So my question is about the particular algorithm or implementation in Matlab’s decomposition function.

Torsten
Torsten am 10 Nov. 2025
Bearbeitet: Torsten am 10 Nov. 2025
Maybe it's best to address this question directly to the MATLAB developers:
We are only MATLAB users who don't have insight into implementation details.
Maybe @Christine Tobler can help.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Thank you for including the smaller matrices to reproduce the issue!
It seems you're running into an edge case where we detect a likely ill conditioning and redo the decomposition in a safer but more expensive way.
We can see some of this by turning on diagnostic messages for the sparse solver using the spparms function. I will use backslash directly, since this does the same as decomposition but has more helpful diagnostic messages.
load MandF.mat
spparms("spumoni", 1);
M \ F;
sp\: bandwidth = 48875+1+48875. sp\: is A diagonal? no. sp\: is band density (0.000276) > bandden (0.500000) to try banded solver, or is the matrix tridiagonal? no. sp\: is A triangular? no. sp\: is A morally triangular? no. sp\: is A a candidate for Cholesky (symmetric, real positive or negative diagonal)? no. sp\: use Unsymmetric MultiFrontal PACKage with automatic reordering. sp\: UMFPACK's factorization was successful. sp\: UMFPACK's solve was successful.
Mt = M'; % explicitly transpose M, since this is what happens in decomposition(M').
% (backslash would detect the ctranspose and compute the equivalent of
% decomposition(M)'\F).
Mt \ F;
sp\: bandwidth = 48875+1+48875. sp\: is A diagonal? no. sp\: is band density (0.000276) > bandden (0.500000) to try banded solver, or is the matrix tridiagonal? no. sp\: is A triangular? no. sp\: is A morally triangular? no. sp\: is A a candidate for Cholesky (symmetric, real positive or negative diagonal)? no. sp\: use Unsymmetric MultiFrontal PACKage with automatic reordering. sp\: UMFPACK's factorization was successful. sp\: However, UMFPACK may have failed to produce a correct factorization possibly due to aggressive pivot tolerances or because the problem is ill-conditioned. sp\: These tolerances were: Pivot Tolerance: 0.100000 Diagonal Pivot Tolerance: 0.001000 We will factor the matrix again but use standard partial pivoting. We will factor the matrix again but use standard partial pivoting. sp\: UMFPACK's solve was successful.
The messages for the second case indicate that backslash detects what it believes to be a too ill-conditioned decomposition, and wants to ensure no precision was lost in the decomposition. It reacts by increasing the pivot tolerance to 1, which matches what would be done for the dense case - safer, but more expensive.
Now why would the condition estimate be so different? The condition of A and A' is of course the same, but this is a cheap estimate based on the computed factors. Specifically, the magnitude of the diagonal elements of the factor U is used.
tic;
[L, U, P, Q, D] = lu(M);
toc
Elapsed time is 0.693204 seconds.
tic;
[Lt, Ut, Pt, Qt, Dt] = lu(Mt);
toc
Elapsed time is 1.872775 seconds.
format shortg
full([min(abs(diag(U))), max(abs(diag(U)))])
ans = 1×2
8.5362e-05 583.02
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
full([min(abs(diag(Ut))), max(abs(diag(Ut)))])
ans = 1×2
3.6294e-10 530.92
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Here's the reason for the discrepancy: The smallest diagonal value in absolute value is 1e-10 for M' but only 1e-5 for M. Note looking at the diagonal values of U doesn't actually give a good estimate for the condition number, it's just the cheapest thing that can give anything approaching such an estimate. The alternative of iteratively solving linear systems would slow down backslash considerably.
I think for the moment, you are best off using the LU command directly, as you mentioned above.
I will pass this case on to my colleagues to discuss it further.

Weitere Antworten (0)

Kategorien

Mehr zu Sparse Matrices finden Sie in Hilfe-Center und File Exchange

Produkte

Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by