Loss estimate using cross-validation
vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
mse = crossval('mse',X,y,'Predfun',predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(criterion
,X1,X2,...,y,'Predfun',predfun)
vals = crossval(...,'name
',value
)
vals = crossval(fun,X)
performs 10-fold cross-validation for the function
fun
, applied to the data in X
.
fun
is a function handle to a function with
two inputs, the training subset of X
, XTRAIN
,
and the test subset of X
, XTEST
,
as follows:
testval = fun(XTRAIN,XTEST)
Each time it is called, fun
should use XTRAIN
to
fit a model, then return some criterion testval
computed
on XTEST
using that fitted model.
X
can be a column vector or a matrix. Rows
of X
correspond to observations; columns correspond
to variables or features. Each row of vals
contains
the result of applying fun
to one test set. If testval
is
a non-scalar value, crossval
converts it to a row
vector using linear indexing and stored in one row of vals
.
vals = crossval(fun,X,Y,...)
is used when
data are stored in separate variables X
, Y
,
... . All variables (column vectors, matrices, or arrays) must have
the same number of rows. fun
is called with the
training subsets of X
, Y
, ...
, followed by the test subsets of X
, Y
,
... , as follows:
testvals = fun(XTRAIN,YTRAIN,...,XTEST,YTEST,...)
mse = crossval('mse',X,y,'Predfun',predfun)
returns
mse
, a scalar containing a 10-fold cross-validation estimate of
mean-squared error for the function predfun
. X
can
be a column vector, matrix, or array of predictors. y
is a column
vector of response values. X
and y
must have the
same number of rows.
predfun
is a function handle called with
the training subset of X
, the training subset of y
,
and the test subset of X
as follows:
yfit = predfun(XTRAIN,ytrain,XTEST)
Each time it is called, predfun
should use XTRAIN
and ytrain
to
fit a regression model and then return fitted values in a column vector
yfit
. Each row of yfit
contains
the predicted values for the corresponding row of XTEST
. crossval
computes
the squared errors between yfit
and the corresponding
response test set, and returns the overall mean across all test sets.
mcr = crossval('mcr',X,y,'Predfun',predfun)
returns
mcr
, a scalar containing a 10-fold cross-validation estimate of
misclassification rate (the proportion of misclassified samples) for the function
predfun
. The matrix X
contains predictor
values and the vector y
contains class labels.
predfun
should use XTRAIN
and
YTRAIN
to fit a classification model and return
yfit
as the predicted class labels for XTEST
.
crossval
computes the number of misclassifications between
yfit
and the corresponding response test set, and returns the
overall misclassification rate across all test sets.
val = crossval(
,
where criterion
,X1,X2,...,y,'Predfun',predfun)criterion
is 'mse'
or
'mcr'
, returns a cross-validation estimate of mean-squared error
(for a regression model) or misclassification rate (for a classification model) with
predictor values in X1
, X2
, ... and, respectively,
response values or class labels in y
. X1
,
X2
, ... and y
must have the same number of
rows. predfun
is a function handle called with the training subsets
of X1
, X2
, ..., the training subset of
y
, and the test subsets of X1
,
X2
, ..., as follows:
yfit=predfun(X1TRAIN,X2TRAIN,...,ytrain,X1TEST,X2TEST,...)
yfit
should be a column vector containing
the fitted values.
vals = crossval(...,'
specifies
one or more optional parameter name/value pairs from the following
table. Specify name
',value
)name
inside single quotes.
Name | Value |
---|---|
holdout | A scalar specifying the ratio or the number of observations |
kfold | A positive integer that is greater than 1 specifying the number of folds
|
leaveout | Specifies leave-one-out cross-validation. The value must be
|
mcreps | A positive integer specifying the number of Monte-Carlo repetitions for validation. If the
first input of |
partition | An object |
stratify | A column vector |
options | A structure that specifies whether to run in parallel,
and specifies the random stream or streams. Create the
|
Only one of kfold
, holdout
, leaveout
,
or partition
can be specified, and partition
cannot be specified with stratify
. If both
partition
and mcreps
are specified, the first
Monte-Carlo repetition uses the partition information in the
cvpartition
object, and the repartition
method is called to generate new partitions for each of the
remaining repetitions. If no cross-validation type is specified, the default is 10-fold
cross-validation.
When using cross-validation with classification algorithms, stratification is preferred. Otherwise, some test sets may not include observations from all classes.
Compute mean-squared error for regression using 10-fold cross-validation:
load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'predfun',regf) cvMse = 0.1015
Compute misclassification rate using stratified 10-fold cross-validation:
load('fisheriris'); y = species; X = meas; cp = cvpartition(y,'k',10); % Stratified cross-validation classf = @(XTRAIN, ytrain,XTEST)(classify(XTEST,XTRAIN,... ytrain)); cvMCR = crossval('mcr',X,y,'predfun',classf,'partition',cp) cvMCR = 0.0200
Compute the confusion matrix using stratified 10-fold cross-validation:
load('fisheriris'); y = species; X = meas; order = unique(y); % Order of the group labels cp = cvpartition(y,'k',10); % Stratified cross-validation f = @(xtr,ytr,xte,yte)confusionmat(yte,... classify(xte,xtr,ytr),'order',order); cfMat = crossval(f,X,y,'partition',cp); cfMat = reshape(sum(cfMat),3,3) cfMat = 50 0 0 0 48 2 0 1 49
cfMat
is the summation of 10 confusion matrices
from 10 test sets.
[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. New York: Springer, 2001.