After applying PCA on a dataset, deleting the PC with highest variance and reconstructing the dataset, some of the data series are flipped, why? and how can I fix it?

10 Ansichten (letzte 30 Tage)
In order to remove noise from a dataset (with 3 data series), I apply PCA on it (using pca function in MATLAB that has zscore inside), remove the PC with highest variance, and reconstruct the dataset. I am very happy with the efficiency of this approach in removing noise. However, one or two first data series are flipped and I need to multiply them by -1. Does anyone know why this problem happens? and how can I fix it?

Akzeptierte Antwort

Milan Bansal
Milan Bansal am 2 Mai 2024
Hi elham,
The flipping of data series is related to the inherent nature of PCA and how eigenvectors (principal components) are determined. PCA works by identifying the eigenvectors of the covariance matrix of the data, and these eigenvectors are the directions of maximum variance. However, eigenvectors are determined up to a sign, meaning that if a vector is an eigenvector, its negative is also an eigenvector with the same eigenvalue. This property leads to the "flipping" issue you're noticing.
When you perform PCA and then reconstruct the data by excluding the principal components with the highest variance, the signs of the remaining components used in the reconstruction can result in some of the data series being flipped.
To fix this, you can refer to the following ways:
  1. Manually multiply the flipped series by -1.
  2. Implement a post-processing step that automatically adjusts the signs. Compare the dot product of the original and reconstructed series. If the dot product is negative, it indicates that the series are in opposite directions, and you can then multiply the reconstructed series by -1. This is shown in the code snippet below:
for i = 1:size(originalData, 2) % Loop through each data series
if dot(originalData(:, i), reconstructedData(:, i)) < 0 % if dot product is negative, multiply by -1
reconstructedData(:, i) = -reconstructedData(:, i);
end
end
Please refer to the following documentation to learn more about pca.
Hope this helps!

Weitere Antworten (0)

Kategorien

Mehr zu Dimensionality Reduction and Feature Extraction finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by