Principal Component Analysis: difference between V matrix from SVD decomposition and coef from pca?

22 Ansichten (letzte 30 Tage)
I am learning how to do principal component analysis with the iris data set, but am struggling to understand why the output differs from doing it via an SVD decomposition versus the pca function built in matlab. To my understanding, the V matrix from the SVD decomposition is a matrix of eigenvectors, so if I multiply the original (demeaned) matrix by V, the result should be the principal component scores. However, when comparing the results to the scores output from PCA, the signs are reversed for columns two and three (otherwise output is identical). Can someone explain why? Am I misunderstanding what V is? Upon inspection, V is identical to the coef matrix from pca, except the signs are reversed for columns 2 and 3.
load iris
%construct x matrix
Xmat=meas(:,1:4);
Xavg=mean(Xmat);
%demean data
B=Xmat-Xavg;
%SVD decomposition
[U,S,V]=svd(B, "econ");
%construct PCA scores using first 3 components
z3=B*V(:,1:3);
%compare with result from pca
[coeff,score]=pca(B);

Antworten (1)

David Goodmanson
David Goodmanson am 9 Okt. 2021
Bearbeitet: David Goodmanson am 9 Okt. 2021
Hello Jessica,
The set of column vectors that define V in svd, or define coeff in pca, can differ from each other by a factor of +-1. So some of the vectors in the first method can point in the opposite direction from the vectors in the second method. That is the same idea as, an eigenvector multiplied by a constant is still an eigenvector, but in describing something as a sum of eigenvectors, the coefficients of the eigenvectors will change accordingly. Another example:
Xmat = ...
[0.4173 0.3377 0.2417
0.0497 0.9001 0.4039
0.9027 0.3692 0.0965
0.9448 0.1112 0.1320
0.4909 0.7803 0.9421
0.4893 0.3897 0.9561]
Xavg=mean(Xmat);
B=Xmat-Xavg;
%SVD decomposition
[U,S,V]=svd(B, "econ");
%construct PCA scores using first 3 components
z3=B*V(:,1:3)
%compare with result from pca
[coeff,score]=pca(B)
coeff
V
score
z3
coeff =
-0.5630 0.5367 0.6285
0.5136 -0.3685 0.7748
0.6475 0.7591 -0.0681
V =
0.5630 -0.5367 0.6285
-0.5136 0.3685 0.7748
-0.6475 -0.7591 -0.0681
score =
-0.1422 -0.1851 -0.1791
0.4586 -0.4665 0.0145
-0.4934 -0.0464 0.1602
-0.6266 0.0982 -0.0156
0.4971 0.2230 0.1623
0.3065 0.3767 -0.1423
z3 =
0.1422 0.1851 -0.1791
-0.4586 0.4665 0.0145
0.4934 0.0464 0.1602
0.6266 -0.0982 -0.0156
-0.4971 -0.2230 0.1623
-0.3065 -0.3767 -0.1423
As you pointed out, coeff and V disagree by a factor of -1 in some columns, but score and z3 also disagree by a factor of -1 in the same columns. That means when you do
B = score*coeff' B = z3*V'
you get back the correct B in both cases, which is what counts.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by