Main Content

crossval

Cross-validate classification ensemble model

Description

example

cvens = crossval(ens) returns a cross-validated (partitioned) classification ensemble model (cvens) from a trained classification ensemble model (ens). By default, crossval uses 10-fold cross-validation on the training data to create cvens, a ClassificationPartitionedEnsemble model.

cvens = crossval(ens,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the fraction of data for holdout validation, and monitor the training of the cross-validation folds.

Input Arguments

expand all

Classification ensemble model, specified as a ClassificationEnsemble model object trained with fitcensemble.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: crossval(ens,KFold=10,NPrint=5) specifies to use 10 folds in a cross-validated model, and to display a message to the command line every time crossval finishes training 5 folds.

Cross-validation partition, specified as a cvpartition object that specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp.

Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Holdout=0.1

Data Types: double | single

Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps:

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact trained models in a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: KFold=5

Data Types: single | double

Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact trained models in an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Leaveout="on"

Data Types: char | string

Printout frequency, specified as a positive integer or "off".

To track the number of folds trained by the software so far, specify a positive integer m. The software displays a message to the command line every time it finishes training m folds.

If you specify "off", the software does not display a message when it completes training folds.

Example: NPrint=5

Data Types: single | double | char | string

Examples

expand all

Create a cross-validated classification model for the Fisher iris data, and assess its quality using the kfoldLoss method.

Load the Fisher iris data set.

load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2.

t = templateTree(MaxNumSplits=1); % Weak learner template tree object
ens = fitcensemble(meas,species,"Method","AdaBoostM2","Learners",t);

Create a cross-validated ensemble from ens and find the classification error averaged over all folds.

rng(10,"twister") % For reproducibility
cvens = crossval(ens);
L = kfoldLoss(cvens)
L = 0.0533

Alternatives

Instead of training a classification ensemble model and then cross-validating it, you can create a cross-validated model directly by using fitcensemble and specifying any of these name-value arguments: CrossVal, CVPartition, Holdout, Leaveout, or KFold.

Extended Capabilities

Version History

Introduced in R2011a