# kfoldLoss

Regression loss for cross-validated kernel regression model

## Syntax

``L = kfoldLoss(CVMdl)``
``L = kfoldLoss(CVMdl,Name,Value)``

## Description

example

````L = kfoldLoss(CVMdl)` returns the regression loss obtained by the cross-validated kernel regression model `CVMdl`. For every fold, `kfoldLoss` computes the regression loss for observations in the validation fold, using a model trained on observations in the training fold.```
````L = kfoldLoss(CVMdl,Name,Value)` returns the mean squared error (MSE) with additional options specified by one or more name-value arguments. For example, you can specify the regression-loss function or which folds to use for loss calculation.```

## Examples

collapse all

Simulate sample data:

```rng(0,'twister'); % For reproducibility n = 1000; x = linspace(-10,10,n)'; y = 1 + x*2e-2 + sin(x)./x + 0.2*randn(n,1);```

Cross-validate a kernel regression model.

`CVMdl = fitrkernel(x,y,'Kfold',5);`

`fitrkernel` implements 5-fold cross-validation. `CVMdl` is a `RegressionPartitionedKernel` model. It contains the property `Trained`, which is a 5-by-1 cell array holding 5 `RegressionKernel` models that the software trained using the training set.

Compute the epsilon-insensitive loss for each fold for observations that `fitrkernel` did not use in training the folds.

`L = kfoldLoss(CVMdl,'LossFun','epsiloninsensitive','Mode','individual')`
```L = 5×1 0.1261 0.1247 0.1107 0.1237 0.1131 ```

## Input Arguments

collapse all

Cross-validated kernel regression model, specified as a `RegressionPartitionedKernel` model object. You can create a `RegressionPartitionedKernel` model using `fitrkernel` and specifying any of the cross-validation name-value pair arguments, for example, `CrossVal`.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'LossFun','epsiloninsensitive','Mode','individual'` specifies `kfoldLoss` to return the epsilon-insensitive loss for each fold.

Fold indices to use for response prediction, specified as the comma-separated pair consisting of `'Folds'` and a numeric vector of positive integers. The elements of `Folds` must range from `1` through `CVMdl.KFold`.

Example: `'Folds',[1 4 10]`

Data Types: `single` | `double`

Loss function, specified as the comma-separated pair consisting of `'LossFun'` and a built-in loss function name or function handle.

• The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar. Also, in the table, $f\left(x\right)=x\beta +b.$

• β is a vector of p coefficients.

• x is an observation from p predictor variables.

• b is the scalar bias.

ValueDescription
`'epsiloninsensitive'`Epsilon-insensitive loss: $\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,|y-f\left(x\right)|-\epsilon \right]$
`'mse'`MSE: $\ell \left[y,f\left(x\right)\right]={\left[y-f\left(x\right)\right]}^{2}$

`'epsiloninsensitive'` is appropriate for SVM learners only.

• Specify your own function using function handle notation.

Assume that `n` is the number of observations in `X`. Your function must have this signature

``lossvalue = lossfun(Y,Yhat,W)``
where:

• The output argument `lossvalue` is a scalar.

• You specify the function name (`lossfun`).

• `Y` is an `n`-dimensional vector of observed responses. `kfoldLoss` passes the input argument `Y` in for `Y`.

• `Yhat` is an `n`-dimensional vector of predicted responses, which is similar to the output of `predict`.

• `W` is an `n`-by-1 numeric vector of observation weights.

Data Types: `char` | `string` | `function_handle`

Loss aggregation level, specified as the comma-separated pair consisting of `'Mode'` and `'average'` or `'individual'`.

ValueDescription
`'average'`Returns losses averaged over all folds
`'individual'`Returns losses for each fold

Example: `'Mode','individual'`

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as `"median"`, `"mean"`, `"omitted"`, or a numeric scalar.

ValueDescription
`"median"``kfoldLoss` uses the median of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
`"mean"``kfoldLoss` uses the mean of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
`"omitted"``kfoldLoss` excludes observations with missing predictor values from the loss computation.
Numeric scalar`kfoldLoss` uses this value as the predicted response value for observations with missing predictor values.

If an observation is missing an observed response value or an observation weight, then `kfoldLoss` does not use the observation in the loss computation.

Example: `"PredictionForMissingValue","omitted"`

Data Types: `single` | `double` | `char` | `string`

## Output Arguments

collapse all

Cross-validated regression losses, returned as a numeric scalar or vector. The interpretation of `L` depends on `LossFun`.

• If `Mode` is `'average'`, then `L` is a scalar.

• Otherwise, `L` is a k-by-1 vector, where k is the number of folds. `L(j)` is the average regression loss over fold `j`.

To estimate `L`, `kfoldLoss` uses the data that created `CVMdl`.

## Version History

Introduced in R2018b

expand all