# normalize

Normalize data

## Syntax

``N = normalize(A)``
``N = normalize(A,dim)``
``N = normalize(___,method)``
``N = normalize(___,method,methodtype)``
``N = normalize(___,'center',centertype,'scale',scaletype)``
``N = normalize(___,Name,Value)``
``[N,C,S] = normalize(___)``

## Description

example

````N = normalize(A)` returns the vectorwise z-score of the data in `A` with center 0 and standard deviation 1. If `A` is a vector, then `normalize` operates on the entire vector `A`. If `A` is a matrix, then `normalize` operates on each column of `A` separately.If `A` is a multidimensional array, then `normalize` operates along the first dimension of `A` whose size does not equal 1. If `A` is a table, or timetable, then `normalize` operates on each variable of `A` separately. ```

example

````N = normalize(A,dim)` specifies the dimension of `A` to operate along. For example, `normalize(A,2)` normalizes each row.```

example

````N = normalize(___,method)` specifies a normalization method with any of the previous syntaxes. For example, `normalize(A,'norm')` normalizes the data in `A` by the Euclidean norm (2-norm).```

example

````N = normalize(___,method,methodtype)` specifies the type of normalization for the given method. For example, `normalize(A,'norm',Inf)` normalizes the data in `A` using the infinity norm.```
````N = normalize(___,'center',centertype,'scale',scaletype)` uses the `'center'` and `'scale'` methods at the same time. These are the only methods you can use together. If you do not specify `centertype` or `scaletype`, then normalize uses the default method type for that method (centering to have a mean of 0 and scaling by the standard deviation).Use this syntax with any center and scale type to perform both methods together. For instance, `N = normalize(A,'center','median','scale','mad')`. You can also use this syntax to specify center and scale values `C` and `S` from a previously computed normalization. For instance, normalize one data set and save the parameters with `[N1,C,S] = normalize(A1)`. Then, reuse those parameters on a different data set with ```N2 = normalize(A2,'center',C,'scale',S)```.```

example

````N = normalize(___,Name,Value)` specifies additional parameters for smoothing using one or more name-value arguments. For example, `normalize(A,'DataVariables',datavars)` normalizes the variables specified by `datavars` when `A` is a table or timetable.```

example

````[N,C,S] = normalize(___)` additionally returns the centering and scaling values `C` and `S` used to perform the normalization. Then, you can normalize different input data using the values in `C` and `S` with ```N = normalize(A2,'center',C,'scale',S)```.```

## Examples

collapse all

Normalize data in a vector and matrix by computing the z-score.

Create a vector `v` and compute the z-score, normalizing the data to have mean 0 and standard deviation 1.

```v = 1:5; N = normalize(v)```
```N = 1×5 -1.2649 -0.6325 0 0.6325 1.2649 ```

Create a matrix `B` and compute the z-score for each column. Then, normalize each row.

`B = magic(3)`
```B = 3×3 8 1 6 3 5 7 4 9 2 ```
`N1 = normalize(B)`
```N1 = 3×3 1.1339 -1.0000 0.3780 -0.7559 0 0.7559 -0.3780 1.0000 -1.1339 ```
`N2 = normalize(B,2)`
```N2 = 3×3 0.8321 -1.1094 0.2774 -1.0000 0 1.0000 -0.2774 1.1094 -0.8321 ```

Scale a vector `A` by its standard deviation.

```A = 1:5; Ns = normalize(A,'scale')```
```Ns = 1×5 0.6325 1.2649 1.8974 2.5298 3.1623 ```

Scale `A` so that its range is in the interval [0,1].

`Nr = normalize(A,'range')`
```Nr = 1×5 0 0.2500 0.5000 0.7500 1.0000 ```

Create a vector `A` and normalize it by its 1-norm.

```A = 1:5; Np = normalize(A,'norm',1)```
```Np = 1×5 0.0667 0.1333 0.2000 0.2667 0.3333 ```

Center the data in `A` so that it has mean 0.

`Nc = normalize(A,'center','mean')`
```Nc = 1×5 -2 -1 0 1 2 ```

Create a table containing height information for five people.

```LastName = {'Sanchez';'Johnson';'Lee';'Diaz';'Brown'}; Height = [71;69;64;67;64]; T = table(LastName,Height)```
```T=5×2 table LastName Height _________ ______ 'Sanchez' 71 'Johnson' 69 'Lee' 64 'Diaz' 67 'Brown' 64 ```

Normalize the height data by the maximum height.

`N = normalize(T,'norm',Inf,'DataVariables','Height')`
```N=5×2 table LastName Height _________ _______ 'Sanchez' 1 'Johnson' 0.97183 'Lee' 0.90141 'Diaz' 0.94366 'Brown' 0.90141 ```

Normalize a data set, return the computed parameter values, and reuse the parameters to apply the same normalization to another data set.

Create a timetable with two variables: `Temperature` and `WindSpeed`. Then create a second timetable with the same variables, but with the samples taken a year later.

```rng default Time1 = (datetime(2019,1,1):days(1):datetime(2019,1,10))'; Temperature = randi([10 40],10,1); WindSpeed = randi([0 20],10,1); T1 = timetable(Temperature,WindSpeed,'RowTimes',Time1)```
```T1=10×2 timetable Time Temperature WindSpeed ___________ ___________ _________ 01-Jan-2019 35 3 02-Jan-2019 38 20 03-Jan-2019 13 20 04-Jan-2019 38 10 05-Jan-2019 29 16 06-Jan-2019 13 2 07-Jan-2019 18 8 08-Jan-2019 26 19 09-Jan-2019 39 16 10-Jan-2019 39 20 ```
```Time2 = (datetime(2020,1,1):days(1):datetime(2020,1,10))'; Temperature = randi([10 40],10,1); WindSpeed = randi([0 20],10,1); T2 = timetable(Temperature,WindSpeed,'RowTimes',Time2)```
```T2=10×2 timetable Time Temperature WindSpeed ___________ ___________ _________ 01-Jan-2020 30 14 02-Jan-2020 11 0 03-Jan-2020 36 5 04-Jan-2020 38 0 05-Jan-2020 31 2 06-Jan-2020 33 17 07-Jan-2020 33 14 08-Jan-2020 22 6 09-Jan-2020 30 19 10-Jan-2020 15 0 ```

Normalize the first timetable. Specify three outputs: the normalized table, and also the centering and scaling parameter values `C` and `S` that the function uses to perform the normalization.

`[T1_norm,C,S] = normalize(T1)`
```T1_norm=10×2 timetable Time Temperature WindSpeed ___________ ___________ _________ 01-Jan-2019 0.57687 -1.4636 02-Jan-2019 0.856 0.92885 03-Jan-2019 -1.4701 0.92885 04-Jan-2019 0.856 -0.4785 05-Jan-2019 0.018609 0.36591 06-Jan-2019 -1.4701 -1.6044 07-Jan-2019 -1.0049 -0.75997 08-Jan-2019 -0.26052 0.78812 09-Jan-2019 0.94905 0.36591 10-Jan-2019 0.94905 0.92885 ```
```C=1×2 table Temperature WindSpeed ___________ _________ 28.8 13.4 ```
```S=1×2 table Temperature WindSpeed ___________ _________ 10.748 7.1056 ```

Now normalize the second timetable `T2` using the parameter values from the first normalization. This technique ensures that the data in `T2` is centered and scaled in the same manner as `T1`.

`T2_norm = normalize(T2,"center",C,"scale",S)`
```T2_norm=10×2 timetable Time Temperature WindSpeed ___________ ___________ _________ 01-Jan-2020 0.11165 0.084441 02-Jan-2020 -1.6562 -1.8858 03-Jan-2020 0.66992 -1.1822 04-Jan-2020 0.856 -1.8858 05-Jan-2020 0.2047 -1.6044 06-Jan-2020 0.39078 0.50665 07-Jan-2020 0.39078 0.084441 08-Jan-2020 -0.6327 -1.0414 09-Jan-2020 0.11165 0.78812 10-Jan-2020 -1.284 -1.8858 ```

By default, `normalize` operates on any variables in `T2` that are also present in `C` and `S`. To normalize a subset of the variables in `T2`, specify the variables to operate on with the `DataVariables` name-value argument. The subset of variables you specify must be present in `C` and `S`.

Specify `WindSpeed` as the data variable to operate on. `normalize` operates on that variable and returns `Temperature` unchanged.

`T2_partial = normalize(T2,"center",C,"scale",S,"DataVariables","WindSpeed")`
```T2_partial=10×2 timetable Time Temperature WindSpeed ___________ ___________ _________ 01-Jan-2020 30 0.084441 02-Jan-2020 11 -1.8858 03-Jan-2020 36 -1.1822 04-Jan-2020 38 -1.8858 05-Jan-2020 31 -1.6044 06-Jan-2020 33 0.50665 07-Jan-2020 33 0.084441 08-Jan-2020 22 -1.0414 09-Jan-2020 30 0.78812 10-Jan-2020 15 -1.8858 ```

## Input Arguments

collapse all

Input data, specified as a scalar, vector, matrix, multidimensional array, table, or timetable.

If `A` is a numeric array and has type `single`, then the output also has type `single`. Otherwise, the output has type `double`.

`normalize` ignores `NaN` values in `A`.

Data Types: `double` | `single` | `table` | `timetable`
Complex Number Support: Yes

Operating dimension, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

For table or timetable input data, `dim` is not supported and operation is along each table or timetable variable separately.

Normalization method, specified as one of these options:

Method

Description

`'zscore'`

z-score with mean 0 and standard deviation 1.

`'norm'`

2-norm.

`'scale'`

Scale by standard deviation.

`'range'`

Rescale range of data to [0,1].

`'center'`

Center data to have mean 0.

`'medianiqr'`

Center and scale data to have median 0 and interquartile range 1.

To return the parameters the function uses to normalize the data, specify the `C` and `S` output arguments.

Method type, specified as an array, table, two-element row vector, or type name, depending on the specified method:

Method

Method Type Options

Description

`'zscore'`

`'std'` (default)

Center and scale to have mean 0 and standard deviation 1.

`'robust'`

Center and scale to have median 0 and median absolute deviation 1.

`'norm'`

Positive numeric scalar (default is 2)

p-norm.

`Inf`

Infinity norm.

`'scale'`

`'std'` (default)

Scale by standard deviation.

`'mad'`

Scale by median absolute deviation.

`'first'`

Scale by first element of data.

`'iqr'`

Scale by interquartile range.

Numeric array

Scale by numeric values. The array must have a compatible size with input `A`.

Table

Scale using variables in a table. Each table variable in the input data `A` is scaled using the value in the similarly-named variable in the scaling table.

`'range'`

2-element row vector (default is [0 1])

Rescale range of data to an interval of the form `[a b]`, where ```a < b```.

`'center'`

`'mean'` (default)

Center to have mean 0.

`'median'`

Center to have median 0.

Numeric array

Shift center by numeric values. The array must have a compatible size with input `A`.

Table

Shift center using variables in a table. Each table variable in the input data `A` is centered using the value in the similarly-named variable in the centering table.

To return the parameters the function uses to normalize the data, specify the `C` and `S` output arguments.

Center and scale method types, specified as any valid `methodtype` option for the `'center'` or `'scale'` methods, respectively. See the `methodtype` argument description for a list of available options for each of the methods.

Example: ```N = normalize(A,'center',C,'scale',S)```

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `normalize(T,'ReplaceValues',false)`

Table variables to operate on, specified as one of the options in this table. The `DataVariables` value indicates which variables of the input table to fill.

Other variables in the table not specified by `DataVariables` pass through to the output without being normalized.

OptionDescriptionExamples
Variable name

A character vector or scalar string specifying a single table variable name

`'Var1'`

`"Var1"`

Vector of variable names

A cell array of character vectors or string array where each element is a table variable name

`{'Var1' 'Var2'}`

`["Var1" "Var2"]`

Scalar or vector of variable indices

A scalar or vector of table variable indices

`1`

`[1 3 5]`

Logical vector

A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

`[true false true]`

Function handle

A function handle that takes a table variable as input and returns a logical scalar

`@isnumeric`

`vartype` subscript

A table subscript generated by the `vartype` function

`vartype('numeric')`

Example: ```normalize(T,'DataVariables',["Var1" "Var2" "Var4"])```

Replace values indicator, specified as one of these values when `A` is a table or timetable:

• `true` or `1` — Replace input table variables with table variables containing normalized data.

• `false` or `0` — Append input table variables with table variables containing normalized data.

For vector, matrix, or multidimensional array input data, `ReplaceValues` is not supported.

Example: `normalize(T,'ReplaceValues',false)`

## Output Arguments

collapse all

Normalized values, returned as an array, table, or timetable.

`N` is the same size as `A` unless the value of `ReplaceValues` is `false`. If the value of `ReplaceValues` is `false`, then the width of `N` is the sum of the input data width and the number of data variables specified.

`normalize` generally operates on all variables of input tables and timetables, except in these cases:

• If you specify `DataVariables`, then `normalize` only operates on the specified variables.

• If you use the syntax `normalize(T,'center',C,'scale',S)` to normalize a table or timetable `T` using previously computed parameters `C` and `S`, then `normalize` automatically uses the variable names in `C` and `S` to determine the data variables in `T` to operate on.

Centering values, returned as an array or table.

When `A` is an array, `normalize` returns `C` and `S` as arrays such that `N = (A - C) ./ S`. Each value in `C` is the centering value used to perform the normalization along the specified dimension. For example, if `A` is a 10-by-10 matrix of data and `normalize` operates along the first dimension, then `C` is a 1-by-10 vector containing the centering value for each column in `A`.

When `A` is a table or timetable, `normalize` returns `C` and `S` as tables containing the centers and scales for each table variable that was normalized, ```N.Var = (A.Var - C.Var) ./ S.Var```. The table variable names of `C` and `S` match corresponding table variables in the input. Each variable in `C` contains the centering value used to normalize the similarly-named variable in `A`.

Scaling values, returned as an array or table.

When `A` is an array, `normalize` returns `C` and `S` as arrays such that `N = (A - C) ./ S`. Each value in `S` is the scaling value used to perform the normalization along the specified dimension. For example, if `A` is a 10-by-10 matrix of data and `normalize` operates along the first dimension, then `S` is a 1-by-10 vector containing the scaling value for each column in `A`.

When `A` is a table or timetable, `normalize` returns `C` and `S` as tables containing the centers and scales for each table variable that was normalized, ```N.Var = (A.Var - C.Var) ./ S.Var```. The table variable names of `C` and `S` match corresponding table variables in the input. Each variable in `S` contains the scaling value used to normalize the similarly-named variable in `A`.

collapse all

### Z-Score

z-scores measure the distance of a data point from the mean in terms of the standard deviation. The standardized data set has mean 0 and standard deviation 1, and retains the shape properties of the original data set (same skewness and kurtosis).

For a random variable X with mean μ and standard deviation σ, the z-score of a value x is $z=\frac{\left(x-\mu \right)}{\sigma }.$ For sample data with mean $\overline{X}$ and standard deviation S, the z-score of a data point x is $z=\frac{\left(x-\overline{X}\right)}{S}.$

### P-Norm

The general definition for the p-norm of a vector v that has N elements is

`${‖v‖}_{p}={\left[\sum _{k=1}^{N}{|{v}_{k}|}^{p}\right]}^{\text{\hspace{0.17em}}1/p}\text{\hspace{0.17em}},$`

where p is any positive real value, `Inf`, or `-Inf`. Some common values of p are 1, 2, and `Inf`.

• If p is 1, then the resulting 1-norm is the sum of the absolute values of the vector elements.

• If p is 2, then the resulting 2-norm gives the vector magnitude or Euclidean length of the vector.

• If p is `Inf`, then ${‖v‖}_{\infty }={\mathrm{max}}_{i}\left(|v\left(i\right)|\right)$.

### Rescaling

Rescaling changes the distance between the min and max values in a data set by stretching or squeezing the points along the number line. The z-scores of the data are preserved, so the shape of the distribution remains the same.

The equation for rescaling data `X` to an arbitrary interval `[a b]` is

`${X}_{rescaled}=a+\left[\frac{X-{\mathrm{min}}_{X}}{{\mathrm{max}}_{X}-{\mathrm{min}}_{X}}\right]\left(b-a\right)\text{\hspace{0.17em}}.$`

While the `normalize` and `rescale` functions can both rescale data to any arbitrary interval, `rescale` also permits clipping the input data to specified minimum and maximum values.

### Interquartile Range

The interquartile range (IQR) of a data set describes the range of the middle 50% of values when the values are sorted. If the median of the data is Q2, the median of the lower half of the data is Q1, and the median of the upper half of the data is Q3, then .

The IQR is generally preferred over looking at the full range of the data when the data contains outliers (very large or very small values) because the IQR excludes the largest 25% and smallest 25% of values in the data.

### Median Absolute Deviation

The median absolute deviation (MAD) of a data set is the median value of the absolute deviations from the median $\stackrel{˜}{X}$ of the data: $\text{MAD}=\text{median}\left(|x-\stackrel{˜}{X}|\right)$. Therefore, the MAD describes the variability of the data in relation to the median.

The MAD is generally preferred over using the standard deviation of the data when the data contains outliers (very large or very small values) because the standard deviation squares deviations from the mean, giving outliers an unduly large impact. Conversely, the deviations of a small number of outliers do not affect the value of the MAD.

## Version History

Introduced in R2018a

expand all