Reducing dimensionality of features with PCA
77 views (last 30 days)
I'm totally confused regarding PCA. I have a 4D image of size 90 x 60 x 12 x 350. That means that each voxel is a vector of size 350 (time series).
Now I divide the 3D image (90 x 60 x 12) into cubes. So let's say a cube contains n voxels, so I have n vectors of size 350. I want to reduce this n vectors to only one vector and then calculate the correlations between all vectors of all cubes.
So for a cube I can construct the matrix M where I just put each voxel after each other, i.e. M = [v1 v2 v3 ... vn] and each v is of size 350.
Now I can apply PCA in Matlab by using [coeff, score, latent, ~, explained] = pca(M); and taking the first component. And now my confusion begins.
- Should I transpose the matrix M, i.e. PCA(M')?
- Should I take the first column of coeff or of score?
- This third question is now a bit unrelated. Let's assume we have a matrix A = rand(30,100) where the rows are the datapoints and the columns are the features. Now I want to reduce the dimensionality of the feature vectors but keeping all data points. How can I do this with PCA? When I do [coeff, score, latent, ~, explained] = pca(M); then coeff is of dimension 100 x 29 and score is of size 30 x 29. I'm totally confused.
Matlaber on 19 Feb 2019
Is there any setting input arguement of
coeff = pca(X)
coeff = pca(X,Name,Value)
[coeff,score,latent] = pca(___)
[coeff,score,latent,tsquared] = pca(___)
[coeff,score,latent,tsquared,explained,mu] = pca(___)
for reducing a matrix of (400 * 40) to (400 * 20) ?
Alfonso Nieto-Castanon on 5 Jun 2015
If you use:
[coeff,score] = pca(M);
Comp_PCA1 = score(:,1);
where M is a (300 by n) matrix of voxel timeseries, and you keep the first column of the resulting matrix score, that will have the (300 by 1) timeseries/vector of component scores most representative of the timeseries variance within your cube.
Note that pca(X) first subtracts the mean effect mean(X,1) from X and then performs SVD on the residuals to decompose the resulting covariance in its principal components. You do not want to use pca(M') because then you would be disregarding the average timeseries across all your voxels within each cube (which often contains useful information). Using pca(M) will instead disregard the average signal across all your timepoints for each voxel, which is fine if you are planning to use this for correlation analyses (since the correlations are invariant to the average value of the timeseries)