Cody

Problem 2043. Six Steps to PCA - Step 1: Centre and Standardize

Introduction

Principal Component Analysis (PCA) is a classic among the many methods of multivariate data analysis. Invented in 1901 by Karl Pearson the method is mostly used today as a tool in exploratory data analysis and dimension reduction, but also for making predictive models in machine learning.

Step 1: Centre and Standardize

A first step for many multivariate methods begins by removing the influence of location and scale from variables in the raw data. Also commonly known as the z-scores of X, Z is a transformation of X such that the columns are centered to have mean 0 and scaled to have standard deviation 1 (unless a column of X is constant, in which case that column of Z is constant at 0). Strictly speaking, z-scores are based on population parameters, whereas the analogous calculation based on sample mean and standard deviation is the Student's t-statistic.

Task

Write a function to centre and standardize the input matrix X, returning as the output a structure with the following fields:

  • Z: the centred and standardized matrix corresponding to the input X
  • Mu: a vector of the original means of columns of X
  • Sigma: a vector of the original standard deviations of columns of X

Tips

  • Matlab's zscore function is part of the Stats Toolbox which is not available in Cody. You'll have to write your own.
  • You should take care to avoid division by zero when a column is invariant.

Following problems in the series

Solution Stats

10.09% Correct | 89.91% Incorrect
Last solution submitted on Aug 07, 2019

Problem Comments

Solution Comments