How to apply PCA correctly?
Ältere Kommentare anzeigen
Hello
I'm currently struggling with PCA and Matlab. Let's say we have a data matrix X and a response y (classification task). X consists of 12 rows and 4 columns. The rows are the data points, the columns are the predictors (features).
Now, I can do PCA with the following command:
[coeff, score] = pca(X);
As I understood from the matlab documentation, coeff contains the loadings and score contains the principal components in the columns. That mean first column of score contains the first principal component (associated with the highest variance) and the first column of coeff contains the loadings for the first principal component.
Is this correct?
But if this is correct, why is then X * coeff not equal to score?
1 Kommentar
Sepp @Sepp
your doubt can be clarified by this tutorial (eventhough in another program context) .. specially after 5' in https://www.youtube.com/watch?v=eJ08Gdl5LH0
the cliclist
fabulous and generous explanation
Akzeptierte Antwort
Weitere Antworten (2)
Yaser Khojah
am 17 Apr. 2019
2 Stimmen
Dear the cyclist, thanks for showing this example. I have a question regarding to the order of the COEFF since they are different than the V. Is there anyway to see which order of these columns? In another word, what are the variables of each column?
8 Kommentare
the cyclist
am 17 Apr. 2019
Bearbeitet: the cyclist
am 17 Apr. 2019
Quoting from the first section of the documentation for the pca function.
"Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance."
You can see that
var(dataInPrincipalComponentSpace)
has values in descending order.
Yaser Khojah
am 17 Apr. 2019
i understand that but I do not see how the PC is related to the column of the original data (X). How can I know which variables from the original data has the strength impact?
Nyssa Capman
am 5 Jan. 2020
Bearbeitet: Nyssa Capman
am 11 Mär. 2020
I believe each row of coeff corresponds to the variables, in the order they were input as.
So, the first column has the coefficients for the 1st* PC, for each variable. The second column has the coefficints for the 2nd PC, for each variable, and so on.
This post is now several months old, and not really the original question, however I was also confused by this when getting started so I wanted to add this in case someone else is confused in the future and finds this post.
*[edited typo from '2nd' to '1st']
Image Analyst
am 5 Jan. 2020
"So, the first column has the coefficients for the 2nd PC, for each variable. " ??? Huh? And this is supposed to reduce confusion?
Alex
am 31 Mär. 2020
Hello,
I have some doubts on pca.
I have 2 variables with n observations each, and the coeff matrix is the following:
0.9999 -0.00944
0.0094 0.9999
As I understood, the first column represents the coefficient of the first principal component, 0.9999 is for the first variable in the initial matrix and 0.0094 for the second one.
But why the linear combination of coeff*variable does not give the same result as the first column of score?
Thank you
the cyclist
am 31 Mär. 2020
As you can see in my code above it is
X * coeff
that should equal score, not
coeff * X
(where X is the de-meaned input to pca).
Yuan Luo
am 8 Nov. 2020
why X need to be de-meaned? since pca by defualt will center the data.
the cyclist
am 26 Dez. 2020
Sorry it took me a while to see this question.
If you do
[coeff,score] = pca(X);
it is true that pca() will internally de-mean the data. So, score is derived from de-meaned data.
But it does not mean that X itself [outside of pca()] has been de-meaned. So, if you are trying to re-create what happens inside pca(), you need to manually de-mean X first.
Greg Heath
am 13 Dez. 2015
0 Stimmen
Hope this helps.
Thank you for formally accepting my answer
Greg
Kategorien
Mehr zu Dimensionality Reduction and Feature Extraction finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!