Accuracy of eig in support of complex step differentation: Derivative estimate accuracy degrades when step gets too small

Question

1 Stimme

Consider the uses of complex step differentiation to estimate the derivative of an eigenvalue of a real non-symmetric matrix, using eig in MATLAB double precision, for the calculation. The traditional recommendation is to use step = 1e-20 * 1i. But perhaps extended precision calculation of eig is needed to support this?

I noticed that for steps smaller than perhaps about 1e-14 *1i (or maybe even less), the accuracy of the complex step differentiation estimate seems to degrade, and become unstable. Is this due to accuracy limitation in eig in MATLAB double precision? The matrix in question has various norms in the ballpark of 1. Furthermore, the same calculation showed greater degradation in MATLAB R2013A WIN32 than in R2014A WIN 64. Is there any reason to think the accuracy of eig should be different between these configurations?

The calculation is carried out as d = step length (say 1e-14 or 1e-20 or whatever). For simplicity, I'll show this for the maximum eigenvalue. Consider a real matrix A. Then to compute the (i,j) component of the gradient of the maximum eigenvalue with respect to A,

B = A;
B(i,j) = A(i,j) + d*1i;
derivative_estimate = imag(max(eig(B))/d)

Complex step differentiation is not "supposed to" have a problem with small step size, and in fact, that's supposed to be its advantage over forward or central differences, whose accuracy is limited by a step size below which accuracy degrades.

Thanks.

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Matt J am 19 Mai 2015

Bearbeitet: Matt J am 19 Mai 2015

In MATLAB Online öffnen

The traditional recommendation is to use step = 1e-20 * 1i. But perhaps extended precision calculation of eig is need to support this?

Or extended precision in the addition operation,

B(i,j) = A(i,j) + d*1i;

It seems to me that, at the very least, the step magnitude needs to be chosen adaptively based on eps(A(i,j)) . If d is less than that, for example, the operation above will merely give B=A due to floating point precision thresholds.

Mark Stone am 19 Mai 2015

Bearbeitet: Mark Stone am 19 Mai 2015

Matt,

I stated that the matrix A is real. Therefore, the machine precision limitation is not encountered on adding a small imaginary number. So imag(B(i,j)) = d*1i, and there is no loss of precision in this addition. That's the only reason why anyone would ever recommend a step magnitude of 1e-20 when calculations are being performed in double precision. Therefore, I believe the key matter here is the accuracy of eig, and what that means about smallest step size.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Cleve Moler am 19 Mai 2015

2 Stimmen

I have to admit that I am unsure about what is going on here. Complex step differentiation relies upon the unperturbed function values for real arguments being real. Then the complex perturbation causes the software to produce a complex result, so numerical differencing is not a concern. But eig of a real nonsymmetric matrix can be complex and taking max just makes things worse. Consider A = gallery(5). The exact eigenvalues are all zero. But eig(A) doesn't reveal that. And a complex step isn't any help. The eigenvalue condition numbers are crucial. For multiple eigenvalues without a full set of eigenvectors, such as gallery(5), the condition numbers are infinite. Taking a complex step, or any step, in the vicinity of a badly conditioned eigenvalue will have numerical difficulties. -- Cleve

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

Cleve Moler am 19 Mai 2015

2 Stimmen

But you should not have to resort to complex step differentiation because an analytic expression for the derivative of an eigenvalue is known. If y and x are the left and right eigenvectors of a matrix A corresponding to a particular eigenvalue lambda, then d(lambda) = (y'*d(A)*x)/(y'*x)

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Mark Stone am 20 Mai 2015

In MATLAB Online öffnen

Cleve,

I am aware that the gradient of a non-repeated eigenvalue of a non-symmetric matrix A can be computed as (slightly rewriting (correcting?) your formula):

{V,D,U] = eig(A);
gradient_of_ith_eigenvalue = U(:,i) * V(:,i)'/norm(U(:,i)' * V(:,i));

However, I was just testing an algorithm, and surprised when the accuracy started to degrade as the step size got too small, instead of not degrading and essentially just eliminating truncation error suffered by finite differences as I was expecting.

Which brings me to a related question, http://www.mathworks.com/matlabcentral/answers/217418-first-matlab-build-directly-supporting-left-eigenvector-in-eig . When I tried to use my eigenvalue gradient function per formula earlier in this comment, and which was developed under R2014A, on another machine running R2013A, I was surprised that there was no provision for left eigenvectors in eig in R2013A. Of course, eig(A') can be used to get left eigenvectors for A, but as I observed, aside from possible difference in order of eigenvalues between eig(A) and eig(A'), which is easy enough to account for, the eigenvalues can have non-trivial differences in values, for instance, a 1e-13 inf norm of difference of sorted eigenvalues even for one random 100 by 100 matrix, and larger for bigger matrices; and I don't know how much larger these differences might get on difficult matrices; so which eigenvalue goes with which if there are distinct, but nearby eigenvalues?

I know someone who wishes to implement the formula above using eigs, but since eigs only produces right eigenvectors, they plan to use eigs(A') to get left eigenvectors of A. That seems even more problematic, because not only might the eigenvalues be somewhat different, but the number and "which" eigenvalues are reported out might not even be the same (full disclosure: I am not a big eigs fan, at least with respect to use inside an algorithm, such as in a user function in nonlinear optimization - it always seems to get in trouble somewhere between starting point and optimum, even if maxit is increased and tolerance decreased). That seems like a recipe for disaster.

Is there a "good" way (better than eig(A) and eig(A')) to get a matching set of eigenvalues, and left and right eigenvectors prior to R2014A (or R2013B)?

Thanks.

Cleve Moler am 20 Mai 2015

I pretty much agree with your assessment of the left eigenvector situation and don't have much more to offer. -- Cleve

Mark Stone am 21 Mai 2015

Bearbeitet: Mark Stone am 21 Mai 2015

In MATLAB Online öffnen

Well, after additional consideration, i guess that

{V,D] = eig(A);
U = inv(V)';

U will work as a perfectly matched left eigenvector matrix, so long as V is not defective (i.e., full rank, therefore non-singular), so this is not really a general purpose approach.

Prior to my accidental discovery that left eigenvectors are not supportd in eig in R2013A, I had been thinking that they were supported since time immemorial, for instance back to when I first used the Fortran-based MATLAB as a student in early 1982. I guess this passed through the cracks of my knowledge because i hadn't really started dealing with left eigenvectors until fairly recently.

Melden Sie sich an, um zu kommentieren.

Answer 3

Cleve Moler am 20 Mai 2015

2 Stimmen

I had forgotten about Nick Higham's comment after my blog post referencing his paper, "The complex step approximation to the Fréchet derivative of a matrix function", http://link.springer.com/article/10.1007/s11075-009-9323-y. Unfortunately, it's behind a Springer pay wall. But I think it says that Mark is on the right track.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Mark Stone am 20 Mai 2015

The as submitted, but not as revised Tech Report version of that paper is at http://eprints.ma.man.ac.uk/1260/01/covered/MIMS_ep2009_31.pdf .

It contains this interesting quote in the concluding remarks: "The main weakness of the CS approximation is that it is prone to damaging cancellation when the underlying method for evaluating f employs complex arithmetic. But for many algorithms, such as those based on real polynomial and rational approximations or matrix iterations, this is not a concern." So I guess this was what you were referring to about eig, the use of complex arithmetic, which therefore makes it subject to non-trivial cancelation error at small step sizes, effectively limiting minimum step size. I now have a deeper understanding of complex step differentiation and its limitations. Actually, in the scalar case anyway, as neat as it seems, I'm not really sure what the big attraction is, since, in the best case, it seems to amount essentially to automatic differentiation, which does not suffer from these little quirks. Of course, back before automatic differentiation (such as in 1967), it would have been a different story.

Melden Sie sich an, um zu kommentieren.

Answer 4

Matt J am 15 Mai 2015

0 Stimmen

I don't think eigenvalues are in general, differentiable as a function of the matrix elements. However, even where they are, shouldn't your derivative estimate involve the difference eig(B)-eig(A) ?

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Matt J am 19 Mai 2015

Bearbeitet: Matt J am 19 Mai 2015

I should have stated that I was performing these calculations only for non-repeated eigenvalues.

And you're sure that the eigenvalues are non-repeated in modulus as well? Even if eig() is differentiable at non-repeated eigenvalues, the combined operation max(eig()) may not be. When the maximum modulus is attained by multiple (even if distinct) eigenvalues, the max() function will face an ambiguity on how to sort them.

In any case, it will probably be worth posting your tests so that we can reproduce what you are seeing. It defies my intuition a bit that any differentiation method can allow arbitrarily small step sizes, and certainly that the step needn't depend on the matrix A. The Taylor remainder term in the derivation of the method at the link you posted should depend on A, it seems to me.

Mark Stone am 20 Mai 2015

In the case I was looking at, in fact the eigenvalues in question were real and unique in magnitude. There was no tie or ambiguity in order, even though there were ties for other eigenvalues. Such an eigenvalue is supposed to be infinitely differentiable.

Melden Sie sich an, um zu kommentieren.

Accuracy of eig in support of complex step differentation: Derivative estimate accuracy degrades when step gets too small

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Weitere Antworten (3)

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Kategorien

Tags

Community Treasure Hunt

Accuracy of eig in support of complex step differentation: Derivative estimate accuracy degrades when step gets too small

3 Kommentare 1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Weitere Antworten (3)

3 Kommentare 1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

3 Kommentare 1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Kategorien

Tags

Siehe auch

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden