Why is Matlab PCA calculation different from results from R and Orange3?

15 Ansichten (letzte 30 Tage)
I am trying to use Matlab to compute PCA of the iris dataset.
First, I went step by step by:
(1) first centring the data,
(2) correlation matrix calculation
(3) then calculating the Eigenvectors from the correlation matrix and
(4) Then multiplying the centred data with the Eigenvectors. The results were different from the figure below.
The Matlab results are also different from what R calculates. The results of R matches the results in orange.
I also used the PCA function in Matlab and the results were different from below.
Any ideas why please?
Many thanks.
PCA calculations.jpg
  6 Kommentare
the cyclist
the cyclist am 4 Nov. 2019
Bearbeitet: the cyclist am 4 Nov. 2019
Another observation: The following R code agrees with MATLAB on the principal components:
library(datasets)
data(iris)
print(prcomp[,1:4])
Standard deviations (1, .., p=4):
[1] 2.0562689 0.4926162 0.2796596 0.1543862
Rotation (n x k) = (4 x 4):
PC1 PC2 PC3 PC4
Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
There is a sign disagreement in PC2 and PC3, but eigenvectors are only determined up to a sign.
John Oyekan
John Oyekan am 4 Nov. 2019
The eigenvectors calculations are okay but it is when it comes to calculating the scores that all seem to fall apart somewhere.

Melden Sie sich an, um zu kommentieren.

Antworten (1)

the cyclist
the cyclist am 4 Nov. 2019
Bearbeitet: the cyclist am 4 Nov. 2019
Here's my guess:
The difference between R and MATLAB is that in R, you scaled the data, in addition to centering them -- each column was divided by its standard deviation. In MATLAB, they were only centered.
If you do this to standardize in MATLAB:
iris_save_standardized = bsxfun(@minus,iris_save,mean(iris_save))./std(iris_save); %center and scale the dataset
you will get the same result.
Specifically, the following R code gives me the same output as MATLAB does, using the scaling in the above line. (The editor here of course assumes MATLAB code, so excuse the weird formatting. It still makes it easier for you to cut and paste into R if you want.)
library(datasets)
data(iris)
scaled_iris <- scale(x=iris[,1:4])
pc <- prcomp(scaled_iris)
score <- data.matrix(scaled_iris) %*% data.matrix(pc$rotation)
print(head(score))
This is a bit different from the output you pasted in your question, though. Not sure about that.
  3 Kommentare
John Oyekan
John Oyekan am 5 Nov. 2019
After your last comments, I tried them and here are my results:
For
1. iris_save_standardized = (iris_save - mean(iris_save))./std(iris_save) %standardize the dataset
dataInPrincipalComponentSpace =
-2.2571 0.4784 0.1273 -0.0241
-2.0740 -0.6719 0.2338 -0.1027
-2.3563 -0.3408 -0.0441 -0.0283
-2.2917 -0.5954 -0.0910 0.0657
-2.3819 0.6447 -0.0157 0.0358
-2.0687 1.4842 -0.0269 -0.0066
-2.4359 0.0475 -0.3344 0.0367
2. iris_save_standardized = bsxfun(@minus,iris_save,mean(iris_save))./std(iris_save); %bsxfun(@minus,iris_save,mean(iris_save)); %standardize the dataset
dataInPrincipalComponentSpace =
-2.2571 0.4784 0.1273 -0.0241
-2.0740 -0.6719 0.2338 -0.1027
-2.3563 -0.3408 -0.0441 -0.0283
-2.2917 -0.5954 -0.0910 0.0657
-2.3819 0.6447 -0.0157 0.0358
-2.0687 1.4842 -0.0269 -0.0066
-2.4359 0.0475 -0.3344 0.0367
Both techniques gave similar results and would you say have similar results to orange?
the cyclist
the cyclist am 5 Nov. 2019
I've never used Orange. (I'd never heard of it until this question.)
The fact that the results are very close, but not exact, suggests to me some minor algorithmic difference. But I really don't know.

Melden Sie sich an, um zu kommentieren.

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by