pca versus princomp diferences in signs of coeffs (loadings) for cities dataset

9 Ansichten (letzte 30 Tage)
So with the latest install of MatLab (2019B), I no longer have access to the function princomp from the old statistics toolbox. I ran into a problem switching to the recommended function pca and tried to diagnose it using the cities dataset (example dataset from princomp documentation). The principal component loadings are different between princomp and pca. The magnitude of the loadings are the same but the signs are sometimes different: pc1 same, pc2 - pc4 opposite signs, pc5-8 same and pc9 opposite sign. The pareto variance explained by each component plots are the same. Has anyone else noticed this? My code follows:
--------------------
close all
clear
clc
% PCA analysis of MatLab "cities" data
load cities
% display categories
categories
%%
% display names
names(1:3,:)
names(end-2:end,:)
%%
% display ratings
ratings(1:3,:)
%%
figure(1)
boxplot(ratings,0,'+',0)
set(gca,'Yticklabel',categories)
%%
stdr = std(ratings); % standard deviation of each rating category
sr = ratings./stdr(ones(329,1),:); % "standardize" each rating
figure(3)
boxplot(sr,0,'+',0)
set(gca,'Yticklabel',categories)
figure(4)
set(gcf, 'paperposition', [0.5 0.5 10.5 7.5])
set(gcf, 'paperorientation', 'landscape')
% use pca
subplot(2,1,1)
[pcs, newdata variances, t2, explained] = pca(sr,'Algorithm','svd','centered','on'); % principal components analysis
%
p3 = pcs(:,1:3); %subset the first three PCs
I = p3'*p3 % shows that the PCs are orthogonal*p3
bar(pcs) % plot the coefficients/loadings for the first 3 PCs;
set(gca,'Xticklabel',categories)
legend('PC1','PC2','PC3','PC4','PC5','PC6','PC7','PC8','PC9')
title('pca')
%use princomp
subplot(2,1,2)
clear pcs neqwdata variances t2
[pcs, newdata, variances, t2] = princomp(sr); % principal components analysis
% pcs = principal component coefficients, aka "loadings"
%
p3 = pcs(:,1:3); %subset the first three PCs
I = p3'*p3 % shows that the PCs are orthogonal*p3
bar(pcs) % plot the coefficients/loadings for the first 3 PCs;
set(gca,'Xticklabel',categories)
legend('PC1','PC2','PC3','PC4','PC5','PC6','PC7','PC8','PC9')
title('princomp')
%%
% [pcs, newdata, variances, t2] = pca(sr); % principal components analysis
% newdata = principal component coefficients, aka "loadings"
%
figure(5)
percent_explained = 100.*variances/sum(variances);
pareto(percent_explained) % make a “scree plot”
xlabel('Principal Component')
ylabel('Percent Explained, %')
  1 Kommentar
Spencer Chen
Spencer Chen am 28 Jan. 2020
I have used both, but I haven't compared them. You should find the corresponding factors also negated, in which case, the output is alright.
Blessings,
Spencer

Melden Sie sich an, um zu kommentieren.

Antworten (2)

John D'Errico
John D'Errico am 28 Jan. 2020
Bearbeitet: John D'Errico am 28 Jan. 2020
The vectors generated are not unique to within a sign change. They can arbitrarily change sign, and nothing matters. (Not just one element can change, but the entire vector can have a sign flip.) This has been true since time began, well, at least since MATLAB began. Not a question of anyone "noticing". This is a known characteristic of these algorithms.

Brian Heikes
Brian Heikes am 29 Jan. 2020
Spencer and John, Thank-you.

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by