sequentialfs

tf = sequentialfs(fun,X,y) selects a subset of features in X that are important for predicting y. The function defines a random nonstratified partition for 10-fold cross-validation using X and y, and then sequentially selects features based on the cross-validate prediction criterion values computed by the fun function. The initial feature set includes no features. sequentialfs adds one feature to the set at each iteration, until adding a feature does not decrease the criterion value by greater than the termination tolerance value. The output tf is a logical vector that indicates the selected features. For more details, see Algorithms.

tf = sequentialfs(fun,X1,...,XN) selects a subset of features in X1 by cross-validating the criterion value on the partition defined for X1,...,XN.

tf = sequentialfs(___,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, specify "Direction","backward" to perform recursive feature elimination (RFE). The initial feature set includes all features. sequentialfs removes one feature from the set at each iteration, until removing a feature does not decrease the prediction criterion.

[tf,history] = sequentialfs(___) also returns information about the feature selection process.

Examples

Forward Feature Selection

Find important features by performing forward sequential feature selection using the wrapper type.

Load the fisheriris data set.

load fisheriris

Display the variables in the data set.

whos

  Name           Size            Bytes  Class     Attributes

  meas         150x4              4800  double              
  species      150x1             18100  cell

The matrix meas contains four measurements from three species of iris flowers for 150 different flowers. The variable species lists the species for each flower.

Specify the predictor data X and the response data y. Define X to include the four measurements and six random variables. Place the measurement variables in columns 1, 3, 5, and 7.

rng("default") % For reproducibility
X = randn(150,10);
X(:,[1 3 5 7])= meas;
y = species;

Define the function handle myfun for an anonymous function that takes four inputs: training data (XTrain and yTrain) and test data (XTest and yTest). The anonymous function trains a classification model by using the training data, and returns a loss value on the test data for the trained model.

myfun = @(XTrain,yTrain,XTest,yTest) ...
  size(XTest,1)*loss(fitcecoc(XTrain,yTrain),XTest,yTest);

The loss function of a classification model object returns an average loss value, but sequentialfs also divides the sum of the criterion values returned by myfun by the total number of test observations. Therefore, the anonymous function must return the loss value multiplied by the number of test observations.

Create a random partition for stratified 10-fold cross-validation.

cv = cvpartition(y,"KFold",10);

Use the sequentialfs function to sequentially select important features in X based on the criterion value returned by myfun. Specify to use the stratified partition cv, and set the iteration option to display information about the feature selection process at each iteration.

opts = statset("Display","iter");
tf = sequentialfs(myfun,X,y,"CV",cv,"Options",opts);

Start forward sequential feature selection:
Initial columns included:  none
Columns that can not be included:  none
Step 1, added column 7, criterion value 0.04
Step 2, added column 5, criterion value 0.0333333
Step 3, added column 1, criterion value 0.0266667
Step 4, added column 3, criterion value 0.0133333
Final columns included:  1 3 5 7

sequentialfs correctly finds the important predictors in columns 1, 3, 5, and 7.

Backward Feature Selection

Find important features by performing backward sequential feature selection, or recursive feature elimination (RFE), using the wrapper type.

Load the hald data set, which measures the effect of cement composition on its hardening heat.

load hald

This data set includes the variables ingredients and heat. The matrix ingredients contains the percent composition of four chemicals present in the cement. The vector heat contains the values for the heat hardening after 180 days for each cement sample.

Use the sequentialfs function to perform backward sequential feature selection based on the criterion value returned by myfun. The code for the helper function myfun appears at the end of this example. Specify the Direction name-value argument as "backward" to include all features in the initial feature set and then sequentially exclude one feature at each iteration. Set the iteration option to display information about the feature selection process at each iteration.

rng("default") % For reproducibility
opts = statset("Display","iter");
tf = sequentialfs(@myfun,ingredients,heat, ...
    "Direction","backward","Options",opts);

Start backward sequential feature selection:
Initial columns included:  all
Columns that must be included:  none
Step 1, used initial columns, criterion value 12.4989
Step 2, removed column 3, criterion value 6.25866
Final columns included:  1 2 4

sequentialfs excludes the third variable from the features in ingredients.

Helper Function

The myfun function takes four inputs: training data (XTrain and yTrain) and test data (XTest and yTest). The function trains a regression model by using the training data, and returns the sum of squared errors on the test data for the trained model.

function criterion = myfun(XTrain,yTrain,XTest,yTest)
    mdl = fitrlinear(XTrain,yTrain);
    predictedYTest = predict(mdl,XTest);
    e = yTest - predictedYTest;
    criterion = e'*e;
end

Filter Type Feature Selection

Perform filter type feature selection based on the correlation coefficients for the features.

Load the carsmall data set.

load carsmall

Create the feature matrix X containing six variables.

X = [Acceleration Cylinders Displacement ...
    Horsepower Model_Year Weight];

Compute the matrix of the pairwise linear correlation coefficients between each pair of features in X by using the corr function. Specify the Rows name-value argument as "pairwise" to omit any rows containing NaN on a pairwise basis for each two-column correlation coefficient calculation.

corr(X,"Rows","pairwise")

ans = 6×6

    1.0000   -0.6473   -0.6947   -0.6968    0.4843   -0.4879
   -0.6473    1.0000    0.9512    0.8622   -0.6053    0.8844
   -0.6947    0.9512    1.0000    0.9134   -0.5779    0.8895
   -0.6968    0.8622    0.9134    1.0000   -0.6082    0.8733
    0.4843   -0.6053   -0.5779   -0.6082    1.0000   -0.4964
   -0.4879    0.8844    0.8895    0.8733   -0.4964    1.0000

X contains highly correlated features. For example, the correlation between the second and third features (Cylinders and Displacement) is 0.9512.

Use the sequentialfs function to rank the features in X based on the correlation values. Specify these options when you call the sequentialfs function:

Use the helper function mycorr, which returns the maximum absolute value of the off-diagonal elements in the matrix of correlation coefficients. The code for this helper function appears at the end of this example.
Specify "Direction","backward" and "NullModel",true so that sequentialfs starts from the initial feature set containing all features and then excludes all features from the set, one feature at a time.
Specify "CV","none" to perform feature selection without cross-validation.
Set the iteration option to display information about the feature selection process at each iteration.

opts = statset("Display","iter");
[~,history] = sequentialfs(@mycorr,X, ...
    "Direction","backward","NullModel",true, ...
    "CV","none","Options",opts);

Start backward sequential feature selection:
Initial columns included:  all
Columns that must be included:  none
Step 1, used initial columns, criterion value 0.951167
Step 2, removed column 3, criterion value 0.884401
Step 3, removed column 6, criterion value 0.862164
Step 4, removed column 4, criterion value 0.647346
Step 5, removed column 2, criterion value 0.484253
Step 6, removed column 1, criterion value 0
Step 7, removed column 5, criterion value 0
Final columns included:  none

sequentialfs returns the structure array history with two fields (In and Crit) containing information about the feature selection process. The In field contains a logical matrix where row i indicates the features selected at iteration i. A true (logical 1) entry in a row indicates that the corresponding feature is in the feature set after the iteration.

history.In

ans = 7x6 logical array

   1   1   1   1   1   1
   1   1   0   1   1   1
   1   1   0   1   1   0
   1   1   0   0   1   0
   1   0   0   0   1   0
   0   0   0   0   1   0
   0   0   0   0   0   0

The Crit field contains the criterion values computed at each iteration.

history.Crit

ans = 1×7

    0.9512    0.8844    0.8622    0.6473    0.4843         0         0

The last two criterion values are zero because the mycorr function returns 0 if the input contains fewer than two features.

Extract the indices of the excluded features from the matrix in the In field.

p = size(X,2);
idx = NaN(1,p);
for i = 1 : p
    idx(i) = find(history.In(i,:)~=history.In(i+1,:));
end
idx

idx = 1×6

     3     6     4     2     1     5

Find the set of features whose criterion value is less than 0.8.

threshold = 0.8;
iter_last_exclude = find(history.Crit(2:end)<threshold,1);
idx_selected = idx(iter_last_exclude+1:end)

idx_selected = 1×3

     2     1     5

Compute the correlation coefficient matrix for the selected features.

corr(X(:,idx_selected),"Rows","pairwise")

ans = 3×3

    1.0000   -0.6473   -0.6053
   -0.6473    1.0000    0.4843
   -0.6053    0.4843    1.0000

The absolute values of the off-diagonal elements are less than the threshold value 0.8.

Helper Function

The mycorr function takes a matrix that contains features in columns, and returns the maximum absolute value of the off-diagonal elements in the matrix of correlation coefficients. The off-diagonal elements are the correlations between two distinct features in the input data. Therefore, mycorr returns zero if the input data does not have at least two distinct features.

function criterion = mycorr(X)
    if size(X,2) < 2
        criterion = 0;
    else
        p = size(X,2);
        R = corr(X,"Rows","pairwise");
        R(logical(eye(p))) = NaN;
        criterion = max(abs(R),[],"all");
    end
end

Select Features in Table

Convert a table that contains both numeric and categorical variables to an array by using the onehotencode and table2array functions. Then, select important features in the array by using the sequentialfs function.

Load the carbig data set.

load carbig

This data set contains variables that describe several aspects of cars, such as miles per gallon (MPG), country of origin (Origin), and number of cylinders (Cylinders). You can create a regression model of MPG using the other variables.

Specify the predictor data tblX in a table, and specify the response data y.

tblX = table(Acceleration,Cylinders,Displacement, ...
    Horsepower,Model_Year,Weight,Origin);
y = MPG;

All variables in tblX are numeric except the Origin variable.

One-hot encode the Origin variable by using the onehotencode function.

tblOrigin = table(categorical(string(Origin)));
tblOrigin = onehotencode(tblOrigin);

Remove the Origin variable from tblX, and add the encoded values to tblX.

tblX.Origin = [];
tblX = [tblX tblOrigin];

Convert the table tblX to an array.

X = table2array(tblX);

Define the function handle myfun for an anonymous function that takes four inputs: training data (XTrain and yTrain) and test data (XTest and yTest). The anonymous function trains a regression model by using the training data, and returns a loss value on the test data for the trained model.

myfun = @(XTrain,yTrain,XTest,yTest) ...
  size(XTest,1)*loss(fitrtree(XTrain,yTrain),XTest,yTest);

The loss function of a regression model object returns the mean squared error (MSE), but sequentialfs also divides the sum of the criterion values returned by myfun by the total number of test observations. Therefore, the anonymous function must return the loss value multiplied by the number of test observations.

Use the sequentialfs function to sequentially select important features in X based on the criterion value returned by myfun.

rng("default") % For reproducibility
tf = sequentialfs(myfun,X,y);

Display the variable names of the selected features.

tblX.Properties.VariableNames(tf)'

ans = 6x1 cell
    {'Cylinders'   }
    {'Displacement'}
    {'Model_Year'  }
    {'Weight'      }
    {'Germany'     }
    {'Italy'       }

Input Arguments

`fun` — Function to compute feature selection criterion
function handle

Function to compute the feature selection criterion, specified as a function handle.

For each candidate feature set, sequentialfs computes the cross-validated criterion value by repeatedly calling the fun function as follows:

For each fold (a group of training and test data sets) defined by the CV name-value argument, sequentialfs calls the fun function to get the criterion value for the fold.
sequentialfs divides the sum of the criterion values by the total number of test observations.

If you specify X and y, then the fun function must have this form:

criterion = fun(XTrain,yTrain,XTest,yTest)

The fun function accepts the training data (XTrain and yTrain) and test data (XTest and yTest).
XTrain and XTest contain a subset of the columns of X that corresponds to the current candidate feature set.
The fun function returns a scalar value criterion.
Typically, fun trains a model by using the training data (XTrain, yTrain), predicts response values for XTest, and returns a loss of the predicted values compared to yTest. Common loss measures include the sum of squared errors for regression models and the number of misclassified observations for classification models.
For example, you can define the myFun function as follows, and then specify fun as @myFun.
```
function criterion = myFun(XTrain,yTrain,XTest,yTest)
  mdl = fitcsvm(XTrain,yTrain);
  predictedYTest = predict(mdl,XTest);
  criterion = sum(~strcmp(yTest,predictedYTest));
end
```
Alternatively, you can define the function handle myFunHandle for an anonymous function as follows, and then specify fun as myFunHandle.
```
myFunHandle = @(XTrain,yTrain,XTest,yTest) ...
  loss(fitcsvm(XTrain,yTrain),XTest,yTest)*size(XTest,1);
```
sequentialfs divides the sum of the criterion values returned by fun by the total number of test observations. So, fun must not divide the loss value by the number of test observations. The loss function of a classification or regression object returns an averaged loss value. Therefore, fun must return the loss value multiplied by the number of test observations. If you define the fun function to return the sum of squared errors or the number of misclassified observations, then the cross-validated criterion value is the mean squared error or the misclassification rate, respectively.

If you specify X1,...,XN, sequentialfs selects features from X1 only, but otherwise imposes no interpretation on X1,...,XN. The function fun still must have this form:

criterion = fun(X1Train,⋯,XNTrain,X1Test,⋯,XNTest)

The fun function accepts the training data (X1Train,…,XNTrain) and test data (X1Test,…,XNTest).
X1Train and X1Test contain a subset of the columns of X1 that corresponds to the current candidate feature set.
The fun function returns a scalar value criterion.

Data Types: function_handle

`X` — Feature data
numeric matrix

Feature data, specified as a numeric matrix. The rows of X correspond to observations, and the columns of X correspond to features. X and y must have the same number of rows.

The custom function defined by the fun argument must accept a group of training and test data sets defined by splitting X. For details, see the fun argument and CV name-value argument.

Data Types: single | double

`y` — Responses (labels)
column vector

Responses (labels), specified as a column vector. X and y must have the same number of rows.

The custom function defined by the fun argument must accept a group of training and test data sets defined by splitting y. For details, see the fun argument and CV name-value argument.

`X1,...,XN` — Input data
matrices

Input data, specified as matrices. The matrices must have the same number of rows.

sequentialfs selects features from X1 only, but otherwise imposes no interpretation on X1,...,XN.

The custom function defined by the fun argument must accept a group of training and test data sets defined by splitting X1,...,XN. For details, see the fun argument and CV name-value argument.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: KeepIn=[1 0 0 0],KeepOut=[0 0 0 1] always includes the first feature and excludes the last feature.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: "KeepIn",[1 0 0 0],"KeepOut",[0 0 0 1]

`CV` — Cross-validation option
10 (default) | positive integer | `cvpartition` object | `"resubstitution"` | `"none"`

Cross-validation option to compute the criterion for each candidate feature subset, specified as a positive integer, cvpartition object, "resubstitution", or "none".

For each candidate feature subset, sequentialfs uses the partition specified by this argument to cross-validate the criterion value returned by the fun function.

Positive integer k — sequentialfs uses a random nonstratified partition for k-fold cross-validation.
cvpartition object — sequentialfs uses a partition specified in the cvpartition object. You can specify a stratified partition, a partition for holdout validation, or a partition for leave-one-out cross-validation. For details, see cvpartition.
"resubstitution" — sequentialfs does not partition the input data. Both the training set and the test set contain all of the original observations. For example, if you specify X and y, then sequentialfs calls fun as criterion = fun(X,y,X,y).
"none" — sequentialfs does not validate the criterion value and calls fun as criterion = fun(X,y), without separating the training and test sets.

Example: "CV","none"

`MCReps` — Number of Monte Carlo repetitions for cross-validation
`1` (default) | positive integer

Number of Monte Carlo repetitions for cross-validation, specified as a positive integer.

If you specify a positive integer greater than 1, sequentialfs repeats the cross-validation computation for the specified number of repetitions for each candidate feature subset.

If CV is "none", "resubstitution", a cvpartition object of type "resubstitution", a cvpartition object of type "leaveout", or a custom cvpartition object (with the IsCustom property set to 1), then the software sets the MCReps value to 1.

Example: "MCReps",10

Data Types: single | double

`Direction` — Direction of sequential search
`"forward"` (default) | `"backward"`

Direction of the sequential search, specified as "forward" or "backward".

"forward" — The initial feature set includes no features, and the sequentialfs function sequentially adds features to the set.
"backward" — The initial feature set includes all features, and the sequentialfs function sequentially removes features from the set. That is, the sequentialfs function performs recursive feature elimination (RFE).

Example: "Direction","backward"

Data Types: char | string

`KeepIn` — Features to include
`[]` (default) | logical vector | vector of positive integers

Features to include, specified as [], a logical vector, or a vector of positive integers.

By default, sequentialfs examines all features for the feature selection process. If you specify features to include using this argument, sequentialfs always includes the features in the candidate feature sets. A true entry in a logical vector or an index value in a vector of positive integers indicates that the output argument tf must include the corresponding feature.

Example: "KeepIn",[1 0 0 0]

Data Types: logical

`KeepOut` — Features to exclude
`[]` (default) | logical vector | vector of positive integers

Features to exclude, specified as [], a logical vector, or a vector of positive integers.

By default, sequentialfs examines all features for the feature selection process. If you specify features to exclude using this argument, sequentialfs excludes the features from the candidate feature sets. A true entry in a logical vector or an index value in a vector of positive integers indicates that the output argument tf must exclude the corresponding feature.

Example: "KeepOut",[0 0 0 1]

Data Types: logical

`NFeatures` — Number of features to select
`[]` (default) | positive integer

Number of features to select, specified as [] or a positive integer.

By default, sequentialfs stops iterations when the function satisfies one of the stopping criteria (MaxIter or TolFun) specified by the Options name-value argument. If you specify the NFeatures name-value argument as a positive integer, sequentialfs stops iterations after selecting the specified number of features. This argument overrides other iteration options.

Example: "NFeatures",2

Data Types: single | double

`NullModel` — Flag to include null model
`false` or `0` (default) | `true` or `1`

Flag to include the null model (model containing no features), specified as a logical 1 (true) or 0 (false).

If you specify true, the sequentialfs function includes the null model as a valid option for the output tf and computes the criterion value for the empty input data. Therefore, the fun function must be able to accept empty matrices as input argument values.

Example: "NullModel",true

Data Types: logical

`Options` — Options for iterations and parallel computation
`statset("sequentialfs")` (default) | structure returned by `statset`

Options for the iterations and parallel computation, specified as a structure returned by statset.

This table lists the option fields and their values.

Field Name	Field Value	Default Value
`Display`	Level of display, specified as `"off"`, `"final"`, or `"iter"`. `"off"` — Display no information. `"final"` — Display the final information. `"iter"` — Display information at each iteration.	`"off"`
`MaxIter`	Maximum number of iterations allowed, specified as a positive integer	`Inf`
`TolFun`	Termination tolerance on the criterion value, specified as a positive scalar	`1e-6` if `Direction` is `"forward"`; `0` if `Direction` is `"backward"`
`TolTypeFun`	Type of the termination tolerance for the criterion value, specified as `"abs"` (absolute tolerance) or `"rel"` (relative tolerance)	`"rel"`
`UseParallel`	Flag to run in parallel, specified as logical `1` (`true`) or `0` (`false`)	`false`
`UseSubstreams`	Flag to run computations in a reproducible manner, specified as logical `1` (`true`) or `0` (`false`). To compute reproducibly, set `Streams` to a type that allows substreams: `"mlfg6331_64"` or `"mrg32k3a"`.	`false`
`Streams`	Random number streams, specified as a `RandStream` object or cell array of such objects. Use a single object except when the `UseParallel` value is `true` and the `UseSubstreams` value is `false`. In that case, use a cell array that has the same size as the parallel pool.	MATLAB^® default random number stream

To compute in parallel, you need Parallel Computing Toolbox™.

Example: "Options",statset("Display","iter")

Data Types: struct

Output Arguments

`tf` — Selected features
logical vector

Selected features, returned as a logical vector. A true (logical 1) entry indicates that the corresponding feature is selected.

`history` — History of feature selection process
structure

History of the feature selection process, returned as a structure array including the In and Crit fields.

In is a logical matrix in which row i indicates the features selected at iteration i.
Crit is a vector containing the criterion values computed at each iteration.

More About