Problem understanding PCA and eigenvectors of covariance matrix

Question

Jaime De La Mota Sanchis am 19 Jul. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/881862-problem-understanding-pca-and-eigenvectors-of-covariance-matrix

Bearbeitet: David Goodmanson am 1 Sep. 2023

Hello everyone. I am currently working with the function pca, principal component analysis and also with the Karhunen-Loève expansion.

As far as I understand, in pca, the scores are equivalent to the eigenvectors of the covariance matrix.

I am working with a matrix called realizations, which is 144*5, meaning there are 144 observations of 5 random variables.

If I do the pca as

[coeff,score,latent,tsquared,explained,mu] = pca(realizations);

I obtain as score a matrix of size 144*5, as expected. However, if I write

covMat=cov(realizations)

The resulting matrix is a 5*5. Therefore, the eigenvectors (calculated using eig) of covMat are a 5*5 matrix too, instead of a 144*5, like in the case of the socres obtained using pca. As far as I know I am doing this right, since in the documentation of cov it says that rows are observations and columns are random variables.

Can someone please tell me the difference between both methodologies?

How can I get a eigenvector of length 144?

Best regards .

Jaime.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

David Goodmanson am 19 Jul. 2021

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/881862-problem-understanding-pca-and-eigenvectors-of-covariance-matrix#answer_750093

Bearbeitet: David Goodmanson am 1 Sep. 2023

In MATLAB Online öffnen

Hello Jaime,

Scores are not the eigenvalues of the covariance matrix. Rather, 'latent' have that role. Scores are projections of each observation onto the principal vectors. The code below is an example using eig (in the interest of numerical accuracy, pca actually uses the svd decomposition by default).

In your case the eigenvalues are always for a 5x5.

x = rand(10,3);
n = size(x,1);
mu = mean(x);
xcm = x - mu;               % cm values for each column
covx = (xcm'*xcm)/(n-1);    % covariance matrix
[v lambda] = eig(covx);
% for consistency with pca, sort the eigenvalues and permute
% the columns of the eigenvector matrix to match.
[lambda, ind] = sort(diag(lambda),'descend');
v = v(:,ind);
coeff = v           % each column of coeff describes a principal vector
score = xcm*v       % scores are projections of each observation onto the principal vectors
latent = lambda     % magnitude of each principal vector
% compare with pca
[coeff1 score1 latent1] = pca(x)
% the principal vectors can differ by a factor of -1 between methods, so
% the coeff ratio below may have either +1 or -1 down columns. 
% However, the score ratio bvelow will have matching -1 down its columns, so the desription
% of observations in terms of principal vectors is unchanged.
% latent values, being eigenvalues, always match.
coeff./coeff1
score./score1
latent./latent1

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Matthew Hooper am 28 Aug. 2023

Bearbeitet: Matthew Hooper am 28 Aug. 2023

Thanks Jaime - this makes sense. I have come across from python and I am still learning matlab. The sklearn model uses the number of components of the pca equal to the minimum of samples or the minimum of the features. For example, in your post above it would use 3 as it is less then 10 so the result from python can be seen to be exactly the same as the example above.

I have run into some old code which I am having difficulty reconciling where the dimensions are inversed from the above code i.e. x= randn(3,10). When I run this code I can see that the coeff1 = 10x2 double, score1 = 3x2 double and latent1 = 2x1 double. Is this result representing 10 different samples with 3 observations on each?

David Goodmanson am 28 Aug. 2023

Hi Matthew,

x = randn(3,10) means 3 samples with 10 observations for each. Ordinarily you could have 10 different linearly independent basis vectors of length 10 to describe the observations, but with 3 samples there are only 3 vectors available. And pca subtracts the mean of those 3 vectors off of each vector, which reduces the number of linearly independent ones to 2. So you see 2 linearly independent length-10 vectors in 'coeff' and two values for 'latent'.

Melden Sie sich an, um zu kommentieren.

Problem understanding PCA and eigenvectors of covariance matrix

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Problem understanding PCA and eigenvectors of covariance matrix

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden