Cody

# Problem 2043. Six Steps to PCA - Step 1: Centre and Standardize

Introduction

Principal Component Analysis (PCA) is a classic among the many methods of multivariate data analysis. Invented in 1901 by Karl Pearson the method is mostly used today as a tool in exploratory data analysis and dimension reduction, but also for making predictive models in machine learning.

Step 1: Centre and Standardize

A first step for many multivariate methods begins by removing the influence of location and scale from variables in the raw data. Also commonly known as the z-scores of X, Z is a transformation of X such that the columns are centered to have mean 0 and scaled to have standard deviation 1 (unless a column of X is constant, in which case that column of Z is constant at 0). Strictly speaking, z-scores are based on population parameters, whereas the analogous calculation based on sample mean and standard deviation is the Student's t-statistic.

Write a function to centre and standardize the input matrix X, returning as the output a structure with the following fields:

• Z: the centred and standardized matrix corresponding to the input X
• Mu: a vector of the original means of columns of X
• Sigma: a vector of the original standard deviations of columns of X

Tips

• Matlab's zscore function is part of the Stats Toolbox which is not available in Cody. You'll have to write your own.
• You should take care to avoid division by zero when a column is invariant.

Following problems in the series

### Solution Stats

16.92% Correct | 83.08% Incorrect
Last Solution submitted on May 31, 2020