Inconsistency in behavior of SVD

Question

Thomas Humphries am 20 Dez. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1615030-inconsistency-in-behavior-of-svd

Bearbeitet: John D'Errico am 23 Dez. 2021

This came out of an example that I gave my class on computing SVD of low-rank flags (based on an exercise from Gilbert Strang's book). I had students compute low-rank approximations to different flags and then plot the singular values of the flags on a semilog scale. Some students' plots ended up looking different than others, and I realized it depended on whether they called svd with one or three output arguments. Here is a minimal working example:

F = zeros(1000,1800); F(:,1:600) = 37; F(:,601:1200) = 255; F(:,1201:end) = 102; %A tricolor flag, eg France or Italy

>> [u,s,v] = svd(F); s(100,100) %s is a matrix

ans =

2.143808870680404e-11

s = svd(F); s(100) %s is a vector

ans =

7.312718101034703e-79

Now, this flag has rank 1, so obviously the 100th singular value is mathematically zero. But what accounts for the nearly 70 orders of magnitude difference here? (which obviously looks very different on a semilog plot!) My assumption was that the svd calculation should be the same whether one is requesting the entire SVD or just the singular values, but obviously something different is happening behind the scenes...

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Christine Tobler am 23 Dez. 2021

3
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1615030-inconsistency-in-behavior-of-svd#answer_860690

Firstly, I agree completely with John's great explanations above: Any singular value below eps*first singular value should be treated as zero.

The behavior here isn't wrong, since in both cases the outputs are accurate up to expected round-off. However, it is clearly puzzling and we would like to make the two calls' results match more closely (so that they would look similar when plotted), even though we don't aim for matching down to the last decimal between different calls like this, preferring to get the best possible performance for each individual call.

Some background: Computing the SVD of a matrix A is done in two stages:

A = X*B*Y', where X and Y are orthogonal matrices and B is a bidiagonal matrix. This incurs a round-off error of eps*norm(A).
B = Q*S*P' where Q and P are orthogonal matrices and S is a diagonal matrix of singular values.

For step 2, there is an algorithm which is specialized for cases where the scaling changes a lot along the diagonals of B, in which case it can compute singular values that differ by much more than a factor eps from largest to smallest. However, this algorithm only computes the singular values and not the singular vectors.

That algorithm has been added into the LAPACK library (a standard linear algebra library which we call for SVD and other computations in MATLAB), to be used when no eigenvectors are requested. This is what we call in MATLAB, and why you are seeing the differences in behavior. We can't practically change this behavior from MATLAB without always computing the singular vectors even when they aren't requested, which would be a huge decrease in performance.

Some more comments:

These differences in behavior largely affect matrices with large differences in scale (either the norms of each row or the norms of each column have large differences between them). Other matrices are usually not noticeably affected.
While the singular values computed in the one-output case are accurate for the matrix B above, this only means they are accurate for the original matrix A if the round-off in step 1 was much smaller than usual (for example, if A was already bidiagonal to start with). Otherwise, we first introduce round-off in step 1, and then get a very accurate picture of the singular values of the matrix that already doesn't match A to the degree that step 2 is using.

So in short, the reason for the difference is that a more accurate algorithm is used in the one-output case that doesn't exist for the three-output case. However, this algorithm is only more accurate if the input matrix has some very special preconditions, otherwise it just represents the round-off error slightly differently, with no effect on accuracy. Singular values below eps*norm(A) should not be trusted, except when you have some very specific knowledge about your input matrix.

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Paul am 23 Dez. 2021

Thanks for that response. A couple of follow-ups:

Is the economy-size decomposition affected by the different algorithms as well?

The doc page for svd() states (the same clause appears for U and for V):

"Different machines and releases of MATLAB can produce different singular vectors that are still numerically accurate. Corresponding columns in U and V can flip their signs, since this does not affect the value of the expression A = U*S*V'."

Is the sign flip in the columns of U and V the only difference that can be produced in reference to the first sentence?

Do you think it would be helpful or not to add a clause in the doc page in the output section for S that states something like: "Different singular values that are still numerically accurate can be produced depending on whether or not U or V is requested in the output."

Actually, it appears that using the tilde output has an impact as well

rng(100);
M = rand(3);
S1 = svd(M);
[~,S2,~] = svd(M,"vector");
isequal(S1,S2)
ans = logical
   0

Finally, do you know of any other functions where whether or not some output is requested has an effect on the values of other output, i.e., behavior similar to svd()?

Christine Tobler am 23 Dez. 2021

Economy-size is affected the same way, that decision is independent of this.

Two other things: For complex, the sign flip is a multiplication with any number that has absolute value 1. Also, if several singular values have the same value, the output is just a set of singular vectors that span the same subspace. So in a different release, these could be different recombinations of the same vectors.

For example, for M = diag([1 2 2]), any two vectors that are zero in their first element and are orthogonal to each other would be valid results for the eigenvectors of 2 that are returned.

I will pass on the suggested clause to our doc team.

Yes, the tilde output doesn't have an effect on the computation, it just means that the computed variable should be discarded right away instead of being stored in a variable. So this will match the three-output results for singular values.

I can't think of another case that has this type of qualitative difference just now. Other similar examples are that matrix multiplication A*B will not match computing A*B(:, ind) for each column separately in a loop. The difference is on the order of round-off, but it's another case where performance is optimized for each individual call, accepting differences on the order of round-off with other related calls that would match in exact arithmetic.

John D'Errico am 23 Dez. 2021

Bearbeitet: John D'Errico am 23 Dez. 2021

We are truly greatful that Christine is at TMW. Certainly I am. I have never seen a case where her input on a question was not insightful and well thought out. Thank you again.

I'll try out a random thought here, just for kicks. Suppose we made the decision today that in future releases, SVD will always look at any singular values, compare them to the maximum singular value, and ZERO out any that fall below eps times that maximum. Essentially, we would be just conceding that our best estimate down there is they are zero anyway. Returning a garbage value seems a bit silly. So just round down to zero, as that is our best estimate of the value at that point.

Would that break anything? Well, it might, at least in the sense it would make SVD not completely backwards compatible. But I could argue if anyone has been truly drawing conclusions based on those small singular vlaues, those conclusions were essentially random fluff.

Does SVD produce correct estimate for the singular values, for a matrix that is known to be non-singular, but is numerically singular? For example, we could consider hilb(25). It looks to have a numerical rank of 14 here.

N = 25;
A = hilb(N);
rankA = rank(A)
rankA = 14

But in fact, we know it is not truly singular, just a difficult problem. Lets look at the singular values there.

[U,S,V] = svd(A);

s = svd(A);

semilogy(1:N,s,'g-o')

hold on

semilogy(1:N,diag(S),'b:s')

semilogy(1:N,double(svd(sym(hilb(N)))),'m--+')

xline(rankA)

yline(eps*max(s),'r')

hold off

Interestingly, even though the last singular value that rank trusted was the 14th one, two more singular values seem to have been accurately produced, since they closely follow the symbolic result. As well, I see that both the 1 and 3 output versions of SVD have produced the same results for the smaller singular values here. That is very different from the flag call from @Thomas Humphries.

All of that suggests we would not have wanted to broadly replace all singular values below

eps*max(s)
ans = 4.3338e-16

with zero.

I would also note that both versions of SVD produce the same result for the small singular values, at least for hilb(25). However, suppose I tried it for hilb(26)? :-) In fact, for any value of N greater than 25, we get a different result. I'll try N=26 here as an example.

N = 26;

A = hilb(N);

rankA = rank(A)

rankA = 14

[U,S,V] = svd(A);

s = svd(A);

semilogy(1:N,s,'g-o')

hold on

semilogy(1:N,diag(S),'b:s')

xline(rankA)

yline(eps*max(s),'r')

Interesting behavior from svd here as beyond N=25, the two versions of SVD diverge in their behavior for the small singular values. Regardless, it does appear that on at least some matrices, SOME of the singular values below eps*max(s) seem to contain useful information content.

Thomas Humphries am 23 Dez. 2021

Thanks for this great answer, Christine!

Melden Sie sich an, um zu kommentieren.

Answer 2

John D'Errico am 21 Dez. 2021

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1615030-inconsistency-in-behavior-of-svd#answer_859320

Bearbeitet: John D'Errico am 21 Dez. 2021

F = zeros(1000,1800); F(:,1:600) = 37; F(:,601:1200) = 255; F(:,1201:end) = 102;  %A tricolor flag, eg France or Italy
[U,S,V] = svd(F); 
s = svd(F);
s(1)
ans = 2.1466e+05
eps(s(1))
ans = 2.9104e-11
S(100,100)
ans = 2.1438e-11
s(100)
ans = 2.1902e-113

So that 100th singular value is roughly no larger than eps*s(1). In fact, even the second singular value is roughly that small.

The first rule of computing, and especialy linear algebra is to NEVER trust anything in the least significant bits, especially when there will be subtly different computations involved on different algorithmic paths.

rank(F)

ans = 1

semilogy(s)

semilogy(diag(S))

It looks like they use different code, IF you just want the singular values. And that makes some sense. Since it is all trash at that point, see rule number 1.

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

John D'Errico am 22 Dez. 2021

Bearbeitet: John D'Errico am 22 Dez. 2021

I think that would get people into a can of worms they don't want to open. Are the singular values as computed by SVD here with only one output technically correct? Of course not, since anything after the first singular value should be 0. All rows of F are identically the same. In fact, we could have produced F in a different form using a simple outer product, as:

F = ones(1000,1)*repelem([37 255 102],[600 600 600]);

And that should make it obvious what an analytical SVD could have produced.

The problem is, the single output svd might almost convince people that the singular values really should be decreasing like the plot shows, if people suddenly started focusing on those "zero" elements and the subtle differences between what is produced by the two different methods. I suppose they are each more closely approximating zero as you move along down the line.

Again, I'd just say that anything below eps(s(1)) is numerically floating point trash, no matter what scheme is used to compute it, and not get people focusing on exactly what those "zeros" look like because that will then force them to write a lot of documentation to explain that behavior.

Paul am 22 Dez. 2021

I'm not as concerned about how to interpret floating point trash outputs as I would be with using the resutls that are not floating point trash. For example

rng(100);
M = rand(3);
S1 = svd(M)
S1 = 3×1
    1.4243
    0.5670
    0.2862
y1 = 2*S1(1); 
[U,S2,V] = svd(M,"vector");
S2
S2 = 3×1
    1.4243
    0.5670
    0.2862
y2 = 2*S2(1);
isequal(y1,y2)
ans = logical
   0

Suppose I have some code that only needs S, and then a few months later I modify that code to also use U and V but do not change any computations using S. I could be in for a surprise when my S-based results change.

John D'Errico am 22 Dez. 2021

Again, the results are identical to within floating point trash! NEVER TRUST THE LEAST SIGNIFICANT BITS OF TWO COMPUTATIONS THAT ARE NOT DONE IN IDENTICALLY THE SAME SEQUENCE.

rng(100);
M = rand(3);
[U,S1,V] = svd(M,"vector");
format long g
S1
S1 = 3×1
          1.42429794861722
         0.567016326172269
         0.286243430498377
S2 = svd(M)
S2 = 3×1
          1.42429794861722
         0.567016326172269
         0.286243430498377
S1 == S2
ans = 3×1 logical array
   0
   0
   0
norm(S1 - S2)
ans = 
      2.98936698014091e-16

So even though the two vectors look the same, the least significant bits vary just slightly. This is expected. It is true for almost any algebraic computation, when two operations are done that while mathematically the same result is expected, you should NEVER trust they will produce the same result down to the least significant bits.

A = rand(10);
B = rand(10);
C = rand(10);
(A + B - C) == (-C + B + A)
ans = 10×10 logical array
   1   1   1   1   1   1   1   1   1   1
   1   1   1   1   0   1   1   1   1   0
   1   1   1   1   1   0   1   1   1   0
   0   1   1   1   1   1   1   1   0   1
   1   1   1   1   1   0   0   0   0   0
   1   1   1   1   1   1   1   1   1   0
   0   1   0   1   1   1   1   0   0   1
   1   1   1   0   1   1   1   0   1   1
   1   1   0   1   1   1   1   1   1   1
   1   0   1   1   1   0   1   1   1   0

You should not be surprised at finding tiny differences there, any more than being surprised at the SVD producing differences on the order of machine epsilon when there is likely a different path chosen in the code. Should the MathWorks document that there are differences in the code between the two paths inside SVD? As I said in my answer, that just begs for more problems explaining why you should not be trusting the LSBs of those numbers. And worse, suppose next year, they change the SVD code in some way, that does use the same path? Now they will need to change the documentation again and re-educate people.

The simple rule is best: NEVER trust the LSBs of your numbers in floating point computations.

Melden Sie sich an, um zu kommentieren.

Answer 3

Chunru am 21 Dez. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1615030-inconsistency-in-behavior-of-svd#answer_859300

F = zeros(1000,1800); 
F(:,1:600) = 37; F(:,601:1200) = 255; F(:,1201:end) = 102;  %A tricolor flag, eg France or Italy

This is a rank 1 matrix as shown by:

rank(F)
ans = 1

For rank 1 matrix, there is only one non-zero singular value. MATLAB arrange the singular values in descending order so that the 1st value is the desired singular value. All the rest should be ideally zero, but subject to computational accuracy.

So s(100,100) for matrix s or s(100) for vector s(100), they corresponding to the true sigular value of 0.

[u,s,v] = svd(F); 
s(100,100)    %s is a matrix
ans = 2.1438e-11
s1 = svd(F); 
s1(100)   %s is a vector
ans = 2.1902e-113

We are not sure the exact internal implementation of svd inside matlab. But the results with different output are more or less accurate up to computing accuracy of double (?).

format longg
[diag(s) s1]
ans = 1000×2
	1.0e+00 *

          214659.730736811          214659.730736811
      2.97801336572204e-10      2.97104202825509e-10
       2.1438088706804e-11      1.03894314937022e-12
       2.1438088706804e-11      1.06656371608964e-23
       2.1438088706804e-11      7.52960822699182e-24
       2.1438088706804e-11      7.33435968746888e-24
       2.1438088706804e-11      5.73717383511201e-24
       2.1438088706804e-11      4.92384115399226e-24
       2.1438088706804e-11      4.28392707240299e-24
       2.1438088706804e-11      3.40957046848442e-24

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Inconsistency in behavior of SVD

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Weitere Antworten (2)

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Inconsistency in behavior of SVD

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Weitere Antworten (2)

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden