# kfoldLoss

Classification loss for observations not used in training

## Syntax

``L = kfoldLoss(CVMdl)``
``L = kfoldLoss(CVMdl,Name,Value)``

## Description

example

````L = kfoldLoss(CVMdl)` returns the cross-validated classification losses obtained by the cross-validated, binary, linear classification model `CVMdl`. That is, for every fold, `kfoldLoss` estimates the classification loss for observations that it holds out when it trains using all other observations.`L` contains a classification loss for each regularization strength in the linear classification models that compose `CVMdl`.```

example

````L = kfoldLoss(CVMdl,Name,Value)` uses additional options specified by one or more `Name,Value` pair arguments. For example, indicate which folds to use for the loss calculation or specify the classification-loss function.```

## Input Arguments

expand all

Cross-validated, binary, linear classification model, specified as a `ClassificationPartitionedLinear` model object. You can create a `ClassificationPartitionedLinear` model using `fitclinear` and specifying any one of the cross-validation, name-value pair arguments, for example, `CrossVal`.

To obtain estimates, kfoldLoss applies the same data used to cross-validate the linear classification model (`X` and `Y`).

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Fold indices to use for classification-score prediction, specified as the comma-separated pair consisting of `'Folds'` and a numeric vector of positive integers. The elements of `Folds` must range from `1` through `CVMdl.KFold`.

Example: `'Folds',[1 4 10]`

Data Types: `single` | `double`

Loss function, specified as the comma-separated pair consisting of `'LossFun'` and a built-in loss function name or function handle.

• The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar.

ValueDescription
`'binodeviance'`Binomial deviance
`'classifcost'`Observed misclassification cost
`'classiferror'`Misclassified rate in decimal
`'exponential'`Exponential loss
`'hinge'`Hinge loss
`'logit'`Logistic loss
`'mincost'`Minimal expected misclassification cost (for classification scores that are posterior probabilities)
`'quadratic'`Quadratic loss

`'mincost'` is appropriate for classification scores that are posterior probabilities. For linear classification models, logistic regression learners return posterior probabilities as classification scores by default, but SVM learners do not (see `predict`).

• Specify your own function using function handle notation.

Let `n` be the number of observations in `X` and `K` be the number of distinct classes (`numel(Mdl.ClassNames)`, `Mdl` is the input model). Your function must have this signature

``lossvalue = lossfun(C,S,W,Cost)``
where:

• The output argument `lossvalue` is a scalar.

• You choose the function name (`lossfun`).

• `C` is an `n`-by-`K` logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in `Mdl.ClassNames`.

Construct `C` by setting ```C(p,q) = 1``` if observation `p` is in class `q`, for each row. Set all other elements of row `p` to `0`.

• `S` is an `n`-by-`K` numeric matrix of classification scores. The column order corresponds to the class order in `Mdl.ClassNames`. `S` is a matrix of classification scores, similar to the output of `predict`.

• `W` is an `n`-by-1 numeric vector of observation weights. If you pass `W`, the software normalizes them to sum to `1`.

• `Cost` is a K-by-`K` numeric matrix of misclassification costs. For example, ```Cost = ones(K) - eye(K)``` specifies a cost of `0` for correct classification, and `1` for misclassification.

Specify your function using `'LossFun',@lossfun`.

Data Types: `char` | `string` | `function_handle`

Loss aggregation level, specified as the comma-separated pair consisting of `'Mode'` and `'average'` or `'individual'`.

ValueDescription
`'average'`Returns losses averaged over all folds
`'individual'`Returns losses for each fold

Example: `'Mode','individual'`

## Output Arguments

expand all

Cross-validated classification losses, returned as a numeric scalar, vector, or matrix. The interpretation of `L` depends on `LossFun`.

Let `R` be the number of regularizations strengths is the cross-validated models (stored in `numel(CVMdl.Trained{1}.Lambda)`) and `F` be the number of folds (stored in `CVMdl.KFold`).

• If `Mode` is `'average'`, then `L` is a 1-by-`R` vector. `L(j)` is the average classification loss over all folds of the cross-validated model that uses regularization strength `j`.

• Otherwise, `L` is an `F`-by-`R` matrix. `L(i,j)` is the classification loss for fold `i` of the cross-validated model that uses regularization strength `j`.

To estimate `L`, `kfoldLoss` uses the data that created `CVMdl` (see `X` and `Y`).

## Examples

expand all

`load nlpdata`

`X` is a sparse matrix of predictor data, and `Y` is a categorical vector of class labels. There are more than two classes in the data.

The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages.

`Ystats = Y == 'stats';`

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation.

```rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on');```

`CVMdl` is a `ClassificationPartitionedLinear` model. By default, the software implements 10-fold cross validation. You can alter the number of folds using the `'KFold'` name-value pair argument.

Estimate the average of the out-of-fold, classification error rates.

`ce = kfoldLoss(CVMdl)`
```ce = 7.6017e-04 ```

Alternatively, you can obtain the per-fold classification error rates by specifying the name-value pair `'Mode','individual'` in `kfoldLoss`.

Load the NLP data set. Preprocess the data as in Estimate k-Fold Cross-Validation Classification Error, and transpose the predictor data.

```load nlpdata Ystats = Y == 'stats'; X = X';```

Cross-validate a binary, linear classification model using 5-fold cross-validation. Optimize the objective function using SpaRSA. Specify that the predictor observations correspond to columns.

```rng(1) % For reproducibility CVMdl = fitclinear(X,Ystats,'Solver','sparsa','KFold',5, ... 'ObservationsIn','columns'); CMdl = CVMdl.Trained{1};```

`CVMdl` is a `ClassificationPartitionedLinear` model. It contains the property `Trained`, which is a 5-by-1 cell array holding a `ClassificationLinear` models that the software trained using the training set of each fold.

Create an anonymous function that measures linear loss, that is,

`$L=\frac{\sum _{j}-{w}_{j}{y}_{j}{f}_{j}}{\sum _{j}{w}_{j}}.$`

${w}_{j}$ is the weight for observation j, ${y}_{j}$ is response j (-1 for the negative class, and 1 otherwise), and ${f}_{j}$ is the raw classification score of observation j. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the `LossFun` name-value pair argument. Because the function does not use classification cost, use `~` to have `kfoldLoss` ignore its position.

`linearloss = @(C,S,W,~)sum(-W.*sum(S.*C,2))/sum(W);`

Estimate the average cross-validated classification loss using the linear loss function. Also, obtain the loss for each fold.

`ce = kfoldLoss(CVMdl,'LossFun',linearloss)`
```ce = -8.0982 ```
`ceFold = kfoldLoss(CVMdl,'LossFun',linearloss,'Mode','individual')`
```ceFold = 5×1 -8.3165 -8.7633 -7.4342 -8.0423 -7.9347 ```

To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare test-sample classification error rates.

Load the NLP data set. Preprocess the data as in Specify Custom Classification Loss.

```load nlpdata Ystats = Y == 'stats'; X = X';```

Create a set of 11 logarithmically-spaced regularization strengths from $1{0}^{-6}$ through $1{0}^{0.5}$.

`Lambda = logspace(-6,-0.5,11);`

Cross-validate binary, linear classification models using 5-fold cross-validation, and that use each of the regularization strengths. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to `1e-8`.

```rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'KFold',5,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8)```
```CVMdl = ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods ```

Extract a trained linear classification model.

`Mdl1 = CVMdl.Trained{1}`
```Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023x11 double] Bias: [-13.3823 -13.3823 -13.3823 -13.3823 -13.3823 ... ] Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 ... ] Learner: 'logistic' Properties, Methods ```

`Mdl1` is a `ClassificationLinear` model object. Because `Lambda` is a sequence of regularization strengths, you can think of `Mdl` as 11 models, one for each regularization strength in `Lambda`.

Estimate the cross-validated classification error.

`ce = kfoldLoss(CVMdl);`

Because there are 11 regularization strengths, `ce` is a 1-by-11 vector of classification error rates.

Higher values of `Lambda` lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.

```Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);```

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

```figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off```

Choose the indexes of the regularization strength that balances predictor variable sparsity and low classification error. In this case, a value between $1{0}^{-4}$ to $1{0}^{-1}$ should suffice.

`idxFinal = 7;`

Select the model from `Mdl` with the chosen regularization strength.

`MdlFinal = selectModels(Mdl,idxFinal);`

`MdlFinal` is a `ClassificationLinear` model containing one regularization strength. To estimate labels for new observations, pass `MdlFinal` and the new data to `predict`.