Main Content

Model Building and Assessment

Feature selection, feature engineering, model selection, hyperparameter optimization, cross-validation, predictive performance evaluation, and classification accuracy comparison tests

When you build a high-quality, predictive classification model, it is important to select the right features (or predictors) and tune hyperparameters (model parameters that are not estimated).

Feature selection and hyperparameter tuning can yield multiple models. You can compare the k-fold misclassification rates, receiver operating characteristic (ROC) curves, or confusion matrices among the models. Or, conduct a statistical test to detect whether a classification model significantly outperforms another.

To engineer new features before training a classification model, use gencfeatures.

To build and assess classification models interactively, use the Classification Learner app.

To automatically select a model with tuned hyperparameters, use fitcauto. This function tries a selection of classification model types with different hyperparameter values and returns a final model that is expected to perform well on new data. Use fitcauto when you are uncertain which classifier types best suit your data.

To tune hyperparameters of a specific model, select the hyperparameter values and cross-validate the model using those values. For example, to tune an SVM model, choose a set of box constraints and kernel scales, and then cross-validate a model for each pair of values. Certain Statistics and Machine Learning Toolbox™ classification functions offer automatic hyperparameter tuning through Bayesian optimization, grid search, or random search. bayesopt, the main function for implementing Bayesian optimization, is flexible enough for many other applications as well. See Bayesian Optimization Workflow.

To interpret a classification model, you can use lime, shapley, and plotPartialDependence.

Apps

Classification LearnerTrain models to classify data using supervised machine learning

Functions

expand all

fscchi2Univariate feature ranking for classification using chi-square tests
fscmrmrRank features for classification using minimum redundancy maximum relevance (MRMR) algorithm
fscncaFeature selection using neighborhood component analysis for classification
oobPermutedPredictorImportancePredictor importance estimates by permutation of out-of-bag predictor observations for random forest of classification trees
predictorImportanceEstimates of predictor importance for classification tree
predictorImportanceEstimates of predictor importance for classification ensemble of decision trees
sequentialfsSequential feature selection using custom criterion
relieffRank importance of predictors using ReliefF or RReliefF algorithm
gencfeaturesPerform automated feature engineering for classification
describeDescribe generated features
transformTransform new data using generated features
fitcautoAutomatically select classification model with optimized hyperparameters
bayesoptSelect optimal machine learning hyperparameters using Bayesian optimization
hyperparametersVariable descriptions for optimizing a fit function
optimizableVariableVariable description for bayesopt or other optimizers
crossvalEstimate loss using cross-validation
cvpartitionPartition data for cross-validation
repartitionRepartition data for cross-validation
testTest indices for cross-validation
trainingTraining indices for cross-validation

Local Interpretable Model-Agnostic Explanations (LIME)

limeLocal interpretable model-agnostic explanations (LIME)
fitFit simple model of local interpretable model-agnostic explanations (LIME)
plotPlot results of local interpretable model-agnostic explanations (LIME)

Shapley Values

shapleyShapley values
fitCompute Shapley values for query point
plotPlot Shapley values

Partial Dependence

partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots

Confusion Matrix

confusionchartCreate confusion matrix chart for classification problem
confusionmatCompute confusion matrix for classification problem

Receiver Operating Characteristic (ROC) Curve

rocmetricsReceiver operating characteristic (ROC) curve and performance metrics for binary and multiclass classifiers
addMetricsCompute additional classification performance metrics
averageCompute performance metrics for average receiver operating characteristic (ROC) curve in multiclass problem
plotPlot receiver operating characteristic (ROC) curves and other performance curves
perfcurveReceiver operating characteristic (ROC) curve or other performance curve for classifier output
testcholdoutCompare predictive accuracies of two classification models
testckfoldCompare accuracies of two classification models by repeated cross-validation

Objects

expand all

FeatureSelectionNCAClassificationFeature selection for classification using neighborhood component analysis (NCA)
FeatureTransformerGenerated feature transformations
BayesianOptimizationBayesian optimization results

Properties

ConfusionMatrixChart PropertiesConfusion matrix chart appearance and behavior
ROCCurve PropertiesReceiver operating characteristic (ROC) curve appearance and behavior

Topics

Classification Learner App

Feature Selection

Feature Engineering

Automated Model Selection

Hyperparameter Optimization

Model Interpretation

Cross-Validation

Classification Performance Evaluation