Main Content

fitcgam

Fit generalized additive model (GAM) for binary classification

Description

example

Mdl = fitcgam(Tbl,ResponseVarName) returns a generalized additive model Mdl trained using the sample data contained in the table Tbl. The input argument ResponseVarName is the name of the variable in Tbl that contains the class labels for binary classification.

example

Mdl = fitcgam(Tbl,formula) uses the model specification argument formula to specify the class labels and predictor variables in Tbl. You can specify a subset of predictor variables and interaction terms for predictor variables by using formula.

Mdl = fitcgam(Tbl,Y) uses the predictor variables in the table Tbl and the class labels in the vector Y.

example

Mdl = fitcgam(X,Y) uses the predictors in the matrix X and the class labels in the vector Y.

example

Mdl = fitcgam(___,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, 'Interactions',5 specifies to include five interaction terms in the model. You can also specify a list of interaction terms using the 'Interactions' name-value argument.

Examples

collapse all

Train a univariate generalized additive model, which contains linear terms for predictors. Then, interpret the prediction for a specified data instance by using the plotLocalEffects function.

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

load ionosphere

Train a univariate GAM that identifies whether the radar return is bad ('b') or good ('g').

Mdl = fitcgam(X,Y)
Mdl = 
  ClassificationGAM
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'logit'
                Intercept: 2.2715
          NumObservations: 351


  Properties, Methods

Mdl is a ClassificationGAM model object. The model display shows a partial list of the model properties. To view the full list of properties, double-click the variable name Mdl in the Workspace. The Variables editor opens for Mdl. Alternatively, you can display the properties in the Command Window by using dot notation. For example, display the class order of Mdl.

classOrder = Mdl.ClassNames
classOrder = 2x1 cell
    {'b'}
    {'g'}

Classify the first observation of the training data, and plot the local effects of the terms in Mdl on the prediction.

label = predict(Mdl,X(1,:))
label = 1x1 cell array
    {'g'}

plotLocalEffects(Mdl,X(1,:))

Figure contains an axes. The axes with title Local Effects Plot contains an object of type bar.

The predict function classifies the first observation X(1,:) as 'g'. The plotLocalEffects function creates a horizontal bar graph that shows the local effects of the 10 most important terms on the prediction. Each local effect value shows the contribution of each term to the classification score for 'g', which is the logit of the posterior probability that the classification is 'g' for the observation.

Train a generalized additive model that contains linear and interaction terms for predictors in three different ways:

  • Specify the interaction terms using the formula input argument.

  • Specify the 'Interactions' name-value argument.

  • Build a model with linear terms first and add interaction terms to the model by using the addInteractions function.

Load Fisher's iris data set. Create a table that contains observations for versicolor and virginica.

load fisheriris
inds = strcmp(species,'versicolor') | strcmp(species,'virginica');
tbl = array2table(meas(inds,:),'VariableNames',["x1","x2","x3","x4"]);
tbl.Y = species(inds,:);

Specify formula

Train a GAM that contains the four linear terms (x1, x2, x3, and x4) and two interaction terms (x1*x2 and x2*x3). Specify the terms using a formula in the form 'Y ~ terms'.

Mdl1 = fitcgam(tbl,'Y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x3');

The function adds interaction terms to the model in the order of importance. You can use the Interactions property to check the interaction terms in the model and the order in which fitcgam adds them to the model. Display the Interactions property.

Mdl1.Interactions
ans = 2×2

     2     3
     1     2

Each row of Interactions represents one interaction term and contains the column indexes of the predictor variables for the interaction term.

Specify 'Interactions'

Pass the training data (tbl) and the name of the response variable in tbl to fitcgam, so that the function includes the linear terms for all the other variables as predictors. Specify the 'Interactions' name-value argument using a logical matrix to include the two interaction terms, x1*x2 and x2*x3.

Mdl2 = fitcgam(tbl,'Y','Interactions',logical([1 1 0 0; 0 1 1 0]));
Mdl2.Interactions
ans = 2×2

     2     3
     1     2

You can also specify 'Interactions' as the number of interaction terms or as 'all' to include all available interaction terms. Among the specified interaction terms, fitcgam identifies those whose p-values are not greater than the 'MaxPValue' value and adds them to the model. The default 'MaxPValue' is 1 so that the function adds all specified interaction terms to the model.

Specify 'Interactions','all' and set the 'MaxPValue' name-value argument to 0.01.

Mdl3 = fitcgam(tbl,'Y','Interactions','all','MaxPValue',0.01);
Mdl3.Interactions
ans = 5×2

     3     4
     2     4
     1     4
     2     3
     1     3

Mdl3 includes five of the six available pairs of interaction terms.

Use addInteractions Function

Train a univariate GAM that contains linear terms for predictors, and then add interaction terms to the trained model by using the addInteractions function. Specify the second input argument of addInteractions in the same way you specify the 'Interactions' name-value argument of fitcgam. You can specify the list of interaction terms using a logical matrix, the number of interaction terms, or 'all'.

Specify the number of interaction terms as 5 to add the five most important interaction terms to the trained model.

Mdl4 = fitcgam(tbl,'Y');
UpdatedMdl4 = addInteractions(Mdl4,5);
UpdatedMdl4.Interactions
ans = 5×2

     3     4
     2     4
     1     4
     2     3
     1     3

Mdl4 is a univariate GAM, and UpdatedMdl4 is an updated GAM that contains all the terms in Mdl4 and five additional interaction terms.

Train a cross-validated GAM with 10 folds, which is the default cross-validation option, by using fitcgam. Then, use kfoldPredict to predict class labels for validation-fold observations using a model trained on training-fold observations.

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

load ionosphere

Create a cross-validated GAM by using the default cross-validation option. Specify the 'CrossVal' name-value argument as 'on'.

rng('default') % For reproducibility
CVMdl = fitcgam(X,Y,'CrossVal','on')
CVMdl = 
  ClassificationPartitionedGAM
    CrossValidatedModel: 'GAM'
         PredictorNames: {1x34 cell}
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1x1 cvpartition]
      NumTrainedPerFold: [1x1 struct]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'logit'


  Properties, Methods

The fitcgam function creates a ClassificationPartitionedGAM model object CVMdl with 10 folds. During cross-validation, the software completes these steps:

  1. Randomly partition the data into 10 sets.

  2. For each set, reserve the set as validation data, and train the model using the other 9 sets.

  3. Store the 10 compact, trained models in a 10-by-1 cell vector in the Trained property of the cross-validated model object ClassificationPartitionedGAM.

You can override the default cross-validation setting by using the 'CVPartition', 'Holdout', 'KFold', or 'Leaveout' name-value argument.

Classify the observations in X by using kfoldPredict. The function predicts class labels for every observation using the model trained without that observation.

label = kfoldPredict(CVMdl);

Create a confusion matrix to compare the true classes of the observations to their predicted labels.

C = confusionchart(Y,label);

Figure contains an object of type ConfusionMatrixChart.

Compute the classification error.

L = kfoldLoss(CVMdl)
L = 0.0712

The average misclassification rate over 10 folds is about 7%.

Optimize the parameters of a GAM with respect to cross-validation by using the bayesopt function.

Load the 1994 census data stored in census1994.mat. The data set consists of demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year. The classification task is to fit a model that predicts the salary category of people given their age, working class, education level, marital status, race, and so on.

load census1994

census1994 contains the training data set adultdata and the test data set adulttest. To reduce the running time for this example, subsample 500 training observations from adultdata by using the datasample function.

rng('default')
NumSamples = 5e2;
adultdata = datasample(adultdata,NumSamples,'Replace',false);

Prepare optimizableVariable objects for the name-value arguments that you want to optimize using Bayesian optimization. This example finds optimal values for the MaxNumSplitsPerPredictor and NumTreesPerPredictor arguments of fitcgam.

maxNumSplits = optimizableVariable('maxNumSplits',[1,10],'Type','integer');
numTrees = optimizableVariable('numTrees',[1,500],'Type','integer');

Create an objective function that takes an input z = [maxNumSplits,numTrees] and returns the cross-validated loss value of z.

minfun = @(z)kfoldLoss(fitcgam(adultdata,'salary','CrossVal','on', ...
    'MaxNumSplitsPerPredictor',z.maxNumSplits, ...
    'NumTreesPerPredictor',z.numTrees)); 

If you specify the cross-validation option ('CrossVal','on'), then the fitcgam function returns a cross-validated model object ClassificationPartitionedGAM. The kfoldLoss function returns the classification loss obtained by the cross-validated model. Therefore, the function handle minfun computes the cross-validation loss at the parameters in z.

Search for the best parameters [maxNumSplits,numTrees] using bayesopt. For reproducibility, choose the 'expected-improvement-plus' acquisition function. The default acquisition function depends on run time and, therefore, can give varying results.

results = bayesopt(minfun,[maxNumSplits,numTrees],'Verbose',0, ...
    'IsObjectiveDeterministic',true, ...
    'AcquisitionFunctionName','expected-improvement-plus');

Obtain the best point from results.

zbest = bestPoint(results)
zbest=1×2 table
    maxNumSplits    numTrees
    ____________    ________

         1            123   

Train an optimized GAM using the zbest values.

Mdl = fitcgam(adultdata,'salary', ...
    'MaxNumSplitsPerPredictor',zbest.maxNumSplits, ...
    'NumTreesPerPredictor',zbest.numTrees);

Input Arguments

collapse all

Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

  • Optionally, Tbl can contain a column for the response variable and a column for the observation weights.

    • The response variable must be a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. A good practice is to specify the order of the classes in the response variable by using the 'ClassNames' name-value argument.

    • The column for the weights must be a numeric vector.

    You must specify the response variable in Tbl by using ResponseVarName or formula and specify the observation weights in Tbl by using 'Weights'.

    • Specify the response variable by using ResponseVarNamefitcgam uses the remaining variables as predictors. To use a subset of the remaining variables in Tbl as predictors, specify predictor variables by using 'PredictorNames'.

    • Define a model specification by using formulafitcgam uses a subset of the variables in Tbl as predictor variables and the response variable, as specified in formula.

  • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable Y and the number of rows in Tbl must be equal. To use a subset of the variables in Tbl as predictors, specify predictor variables by using 'PredictorNames'.

fitcgam considers NaN, '' (empty character vector), "" (empty string), <missing>, and <undefined> values in Tbl to be missing values.

  • fitcgam does not use observations with all missing values in the fit.

  • fitcgam does not use observations with missing response values in the fit.

  • fitcgam uses observations with some missing values for predictors to find splits on variables for which these observations have valid values.

Data Types: table

Response variable name, specified as a character vector or string scalar containing the name of the response variable in Tbl. For example, if the response variable Y is stored in Tbl.Y, then specify it as 'Y'.

Data Types: char | string

Model specification, specified as a character vector or string scalar in the form 'Y ~ terms'. The formula argument specifies a response variable and linear and interaction terms for predictor variables. Use formula to specify a subset of variables in Tbl as predictors for training the model. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula.

For example, specify 'Y~x1+x2+x3+x1:x2'. In this form, Y represents the response variable, and x1, x2, and x3 represent the linear terms for the predictor variables. x1:x2 represents the interaction term for x1 and x2.

The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB® identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

Alternatively, you can specify a response variable and linear terms for predictors using formula, and specify interaction terms for predictors using 'Interactions'.

fitcgam builds a set of interaction trees using only the terms whose p-values are not greater than the 'MaxPValue' value.

Example: 'Y~x1+x2+x3+x1:x2'

Data Types: char | string

Class labels, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Each row of Y represents the classification of the corresponding row of X or Tbl.

A good practice is to specify the order of the classes by using the 'ClassNames' name-value argument.

fitcgam considers NaN, '' (empty character vector), "" (empty string), <missing>, and <undefined> values in Y to be missing values. fitcgam does not use observations with missing response values in the fit.

Data Types: single | double | categorical | logical | char | string | cell

Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one predictor variable.

fitcgam considers NaN values in X as missing values. The function does not use observations with all missing values in the fit. fitcgam uses observations with some missing values for X to find splits on variables for which these observations have valid values.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Interactions','all','MaxPValue',0.05 specifies to include all available interaction terms whose p-values are not greater than 0.05.
GAM Options

collapse all

Initial learning rate of gradient boosting for interaction terms, specified as a numeric scalar in the interval (0,1].

For each boosting iteration for interaction trees, fitcgam starts fitting with the initial learning rate. The function halves the learning rate until it finds a rate that improves the model fit.

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy.

For more details about gradient boosting, see Gradient Boosting Algorithm.

Example: 'InitialLearnRateForInteractions',0.1

Data Types: single | double

Initial learning rate of gradient boosting for linear terms, specified as a numeric scalar in the interval (0,1].

For each boosting iteration for predictor trees, fitcgam starts fitting with the initial learning rate. The function halves the learning rate until it finds a rate that improves the model fit.

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy.

For more details about gradient boosting, see Gradient Boosting Algorithm.

Example: 'InitialLearnRateForPredictors',0.1

Data Types: single | double

Number or list of interaction terms to include in the candidate set S, specified as a nonnegative integer scalar, a logical matrix, or 'all'.

  • Number of interaction terms, specified as a nonnegative integer — S includes the specified number of important interaction terms, selected based on the p-values of the terms.

  • List of interaction terms, specified as a logical matrix — S includes the terms specified by a t-by-p logical matrix, where t is the number of interaction terms, and p is the number of predictors used to train the model. For example, logical([1 1 0; 0 1 1]) represents two pairs of interaction terms: a pair of the first and second predictors, and a pair of the second and third predictors.

    If fitcgam uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. That is, the column indexes of the logical matrix do not count the response and observation weight variables. The indexes also do not count any variables not used by the function.

  • 'all'S includes all possible pairs of interaction terms, which is p*(p – 1)/2 number of terms in total.

Among the interaction terms in S, the fitcgam function identifies those whose p-values are not greater than the 'MaxPValue' value and uses them to build a set of interaction trees. Use the default value ('MaxPValue',1) to build interaction trees using all terms in S.

Example: 'Interactions','all'

Data Types: single | double | logical | char | string

Maximum number of decision splits (or branch nodes) for each interaction tree (boosted tree for an interaction term), specified as a positive integer scalar.

Example: 'MaxNumSplitsPerInteraction',5

Data Types: single | double

Maximum number of decision splits (or branch nodes) for each predictor tree (boosted tree for a linear term), specified as a positive integer scalar. By default, fitcgam uses a tree stump for a predictor tree.

Example: 'MaxNumSplitsPerPredictor',5

Data Types: single | double

Maximum p-value for detecting interaction terms, specified as a numeric scalar in the interval [0,1].

fitcgam first finds the candidate set S of interaction terms from formula or 'Interactions'. Then the function identifies the interaction terms whose p-values are not greater than the 'MaxPValue' value and uses them to build a set of interaction trees.

The default value ('MaxPValue',1) builds interaction trees for all interaction terms in the candidate set S.

For more details about detecting interaction terms, see Interaction Term Detection.

Example: 'MaxPValue',0.05

Data Types: single | double

Number of bins for numeric predictors, specified as a positive integer scalar or [] (empty).

  • If you specify the 'NumBins' value as a positive integer scalar (numBins), then fitcgam bins every numeric predictor into at most numBins equiprobable bins, and then grows trees on the bin indices instead of the original data.

    • The number of bins can be less than numBins if a predictor has fewer than numBins unique values.

    • fitcgam does not bin categorical predictors.

  • If the 'NumBins' value is empty ([]), then fitcgam does not bin any predictors.

When you use a large training data set, this binning option speeds up training but might cause a decrease in accuracy. You can first use the default value of 'NumBins', and then change the value depending on the accuracy and training speed.

The trained model Mdl stores the bin edges in the BinEdges property.

Example: 'NumBins',50

Data Types: single | double

Number of trees per interaction term, specified as a positive integer scalar.

The 'NumTreesPerInteraction' value is equivalent to the number of gradient boosting iterations for the interaction terms for predictors. For each iteration, fitcgam adds a set of interaction trees to the model, one tree for each interaction term. To learn about the gradient boosting algorithm, see Gradient Boosting Algorithm.

You can determine whether the fitted model has the specified number of trees by viewing the diagnostic message displayed when 'Verbose' is 1 or 2, or by checking the ReasonForTermination property value of the model Mdl.

Example: 'NumTreesPerInteraction',500

Data Types: single | double

Number of trees per linear term, specified as a positive integer scalar.

The 'NumTreesPerPredictor' value is equivalent to the number of gradient boosting iterations for the linear terms for predictors. For each iteration, fitcgam adds a set of predictor trees to the model, one tree for each predictor. To learn about the gradient boosting algorithm, see Gradient Boosting Algorithm.

You can determine whether the fitted model has the specified number of trees by viewing the diagnostic message displayed when 'Verbose' is 1 or 2, or by checking the ReasonForTermination property value of the model Mdl.

Example: 'NumTreesPerPredictor',500

Data Types: single | double

Other Classification Options

collapse all

Categorical predictors list, specified as one of the values in this table.

ValueDescription
Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data that contains a categorical variable. The index values are between 1 and p, where p is the number of predictors used to train the model.

If fitcgam uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The 'CategoricalPredictors' values do not count the response variable, the observation weight variable, and any other variables that the function does not use.

Logical vector

A true entry means that the corresponding column of predictor data is a categorical variable. The length of the vector is p.

Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.
'all'All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitcgam assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitcgam assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value argument.

Example: 'CategoricalPredictors','all'

Data Types: single | double | logical | char | string | cell

Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y.

If ClassNames is a character array, then each element must correspond to one row of the array.

Use ClassNames to:

  • Specify the order of the classes during training.

  • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict.

  • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}.

The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y.

Example: 'ClassNames',{'b','g'}

Data Types: categorical | char | string | logical | single | double | cell

Misclassification cost of a point, specified as one of the following:

  • 2-by-2 numeric matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). To specify the class order for the corresponding rows and columns of Cost, set the 'ClassNames' name-value argument.

  • Structure S with two fields: S.ClassNames, which contains the group names as a variable of the same data type as the response variable in Tbl or Y; and S.ClassificationCosts, which contains the cost matrix.

Example: 'Cost',[0 2; 1 0]

Data Types: single | double | struct

Number of iterations between diagnostic message printouts, specified as a nonnegative integer scalar. This argument is valid only when you specify 'Verbose' as 1.

If you specify 'Verbose',1 and 'NumPrint',numPrint, then the software displays diagnostic messages every numPrint iterations in the Command Window.

Example: 'NumPrint',500

Data Types: single | double

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of PredictorNames depends on the way you supply the training data.

  • If you supply X and Y, then you can use PredictorNames to assign names to the predictor variables in X.

    • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.

    • By default, PredictorNames is {'x1','x2',...}.

  • If you supply Tbl, then you can use PredictorNames to choose which predictor variables to use in training. That is, fitcgam uses only the predictor variables in PredictorNames and the response variable during training.

    • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable.

    • By default, PredictorNames contains the names of all predictor variables.

    • A good practice is to specify the predictors for training using either 'PredictorNames' or formula, but not both.

Example: 'PredictorNames',{'SepalLength','SepalWidth','PetalLength','PetalWidth'}

Data Types: string | cell

Prior probabilities for each class, specified as one of the following:

  • Character vector or string scalar.

    • 'empirical' determines class probabilities from class frequencies in the response variable in Y or Tbl. If you pass observation weights, fitcgam uses the weights to compute the class probabilities.

    • 'uniform' sets all class probabilities to be equal.

  • Vector (one scalar value for each class). To specify the class order for the corresponding elements of 'Prior', set the 'ClassNames' name-value argument.

  • Structure S with two fields.

    • S.ClassNames contains the class names as a variable of the same type as the response variable in Y or Tbl.

    • S.ClassProbs contains a vector of corresponding probabilities.

fitcgam normalizes the weights in each class ('Weights') to add up to the value of the prior probability of the respective class.

Example: 'Prior','uniform'

Data Types: char | string | single | double | struct

Response variable name, specified as a character vector or string scalar.

  • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable.

  • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'.

Example: 'ResponseName','response'

Data Types: char | string

Score transformation, specified as a built-in transformation function name or function handle.

This table summarizes the available score transformations. Specify one using its corresponding character vector or string scalar.

ValueDescription
'doublelogit'1/(1 + e–2x)
'invlogit'log(x / (1 – x))
'ismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
'logit'1/(1 + ex)
'none' or 'identity'x (no transformation)
'sign'–1 for x < 0
0 for x = 0
1 for x > 0
'symmetric'2x – 1
'symmetricismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
'symmetriclogit'2/(1 + ex) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

This argument determines the output score computation for object functions such as predict, margin, and edge. Use 'logit' (default) to compute posterior probabilities, and use 'none' to compute the logit of posterior probabilities.

Example: 'ScoreTransform','none'

Data Types: char | string | function_handle

Verbosity level, specified as 0, 1, or 2. The Verbose value controls the amount of information that the software displays in the Command Window.

This table summarizes the available verbosity level options.

ValueDescription
0The software displays no information.
1The software displays diagnostic messages every numPrint iterations, where numPrint is the 'NumPrint' value.
2The software displays diagnostic messages at every iteration.

Each line of the diagnostic messages shows the information about each boosting iteration and includes the following columns:

  • Type — Type of trained trees, 1D (predictor trees, or boosted trees for linear terms for predictors) or 2D (interaction trees, or boosted trees for interaction terms for predictors)

  • NumTrees — Number of trees per linear term or interaction term that fitcgam added to the model so far

  • DevianceDeviance of the model

  • RelTol — Relative change of model predictions: (y^ky^k1)(y^ky^k1)/y^ky^k, where y^k is a column vector of model predictions at iteration k

  • LearnRate — Learning rate used for the current iteration

Example: 'Verbose',1

Data Types: single | double

Observation weights, specified as a vector of scalar values or the name of a variable in Tbl. The software weights the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows in X or Tbl.

If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored in Tbl.W, then specify it as 'W'.

fitcgam normalizes the weights in each class to add up to the value of the prior probability of the respective class.

Data Types: single | double | char | string

Cross-Validation Options

collapse all

Flag to train a cross-validated model, specified as 'on' or 'off'.

If you specify 'on', then the software trains a cross-validated model with 10 folds.

You can override this cross-validation setting using the 'CVPartition', 'Holdout', 'KFold', or 'Leaveout' name-value argument. You can use only one cross-validation name-value argument at a time to create a cross-validated model.

Alternatively, cross-validate after creating a model by passing Mdl to crossval.

Example: 'Crossval','on'

Cross-validation partition, specified as a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp.

Fraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Holdout',0.1

Data Types: double | single

Number of folds to use in a cross-validated model, specified as a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps:

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact, trained models in a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: 'KFold',5

Data Types: single | double

Leave-one-out cross-validation flag, specified as 'on' or 'off'. If you specify 'Leaveout','on', then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact, trained models in an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Leaveout','on'

Output Arguments

collapse all

Trained generalized additive model, returned as one of the model objects in this table.

Model ObjectCross-Validation Options to Train Model ObjectWays to Classify Observations Using Model Object
ClassificationGAMNoneUse predict to classify new observations, and use resubPredict to classify training observations.
ClassificationPartitionedGAMSpecify KFold, Holdout, Leaveout, CrossVal, or CVPartitionUse kfoldPredict to classify observations that fitcgam holds out during training. kfoldPredict predicts a class label for every observation by using the model trained without that observation.

To reference properties of Mdl, use dot notation. For example, enter Mdl.Interactions in the Command Window to display the interaction terms in Mdl.

More About

collapse all

Generalized Additive Model (GAM) for Binary Classification

A generalized additive model (GAM) is an interpretable model that explains class scores (the logit of class probabilities) using a sum of univariate and bivariate shape functions of predictors.

fitcgam uses a boosted tree as a shape function for each predictor and, optionally, each pair of predictors; therefore, the function can capture a nonlinear relation between a predictor and the response variable. Because contributions of individual shape functions to the prediction (classification score) are well separated, the model is easy to interpret.

The standard GAM uses a univariate shape function for each predictor.

y~Binomial(n,μ)g(μ)=logμ1μ=c+f1(x1)+f2(x2)++fp(xp),

where y is a response variable that follows the binomial distribution with the probability of success (probability of positive class) μ in n observations. g(μ) is a logit link function, and c is an intercept (constant) term. fi(xi) is a univariate shape function for the ith predictor, which is a boosted tree for a linear term for the predictor (predictor tree).

You can include interactions between predictors in a model by adding bivariate shape functions of important interaction terms to the model.

g(μ)=c+f1(x1)+f2(x2)++fp(xp)+i,j{1,2,,p}fij(xixj),

where fij(xixj) is a bivariate shape function for the ith and jth predictors, which is a boosted tree for an interaction term for the predictors (interaction tree).

fitcgam finds important interaction terms based on the p-values of F-tests. For details, see Interaction Term Detection.

Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to the saturated model.

The deviance of a fitted model is twice the difference between the loglikelihoods of the model and the saturated model:

-2(logL - logLs),

where L and Ls are the likelihoods of the fitted model and the saturated model, respectively. The saturated model is the model with the maximum number of parameters that you can estimate.

fitcgam uses the deviance to measure the goodness of model fit and finds a learning rate that reduces the deviance at each iteration. Specify 'Verbose' as 1 or 2 to display the deviance and learning rate in the Command Window.

Algorithms

collapse all

Gradient Boosting Algorithm

fitcgam fits a generalized additive model using a gradient boosting algorithm (Adaptive Logistic Regression).

fitcgam first builds sets of predictor trees (boosted trees for linear terms for predictors) and then builds sets of interaction trees (boosted trees for interaction terms for predictors). The boosting algorithm iterates for at most 'NumTreesPerPredictor' times for predictor trees, and then iterates for at most 'NumTreesPerInteraction' times for interaction trees.

For each boosting iteration, fitcgam builds a set of predictor trees with the initial learning rate 'InitialLearnRateForPredictors', or builds a set of interaction trees with the initial learning rate 'InitialLearnRateForInteractions'.

  • When building a set of trees, the function trains one tree at a time. It fits a tree to the residual that is the difference between the response and the aggregated prediction from all trees grown previously. To control the boosting learning speed, the function shrinks the tree by the learning rate and then adds the tree to the model and updates the residual.

    • Updated model = current model + (learning rate)·(new tree)

    • Updated residual = current residual – (learning rate)·(response explained by new tree)

  • If adding the set of trees improves the model fit (that is, reduces the deviance of the fit), then fitcgam moves to the next iteration.

  • Otherwise, fitcgam halves the learning rate and uses it to update the model and residual. The function continues to halve the learning rate until it finds a rate that improves the model fit.

    • If the function cannot find such a learning rate for predictor trees, then it stops boosting iterations for linear terms and starts boosting iterations for interaction terms.

    • If the function cannot find such a learning rate for interaction trees, then it terminates the model fitting.

    You can determine why training stopped by checking the ReasonForTermination property of the trained model.

Interaction Term Detection

For each pairwise interaction term xixj (specified by formula or 'Interactions'), the software performs an F-test to examine whether the term is statistically significant.

To speed up the process, fitcgam bins numeric predictors into at most 8 equiprobable bins. The number of bins can be less than 8 if a predictor has fewer than 8 unique values. The F-test examines the null hypothesis that the bins created by xi and xj have equal responses versus the alternative that at least one bin has a different response value from the others. A small p-value indicates that differences are significant, which implies that the corresponding interaction term is significant and, therefore, including the term can improve the model fit.

fitcgam builds a set of interaction trees using the terms whose p-values are not greater than the 'MaxPValue' value. You can use the default 'MaxPValue' value 1 to build interaction trees using all terms specified by formula or 'Interactions'.

fitcgam adds interaction terms to the model in the order of importance based on the p-values. Use the Interactions property of the returned model to check the order of the interaction terms added to the model.

References

[1] Lou, Yin, Rich Caruana, and Johannes Gehrke. "Intelligible Models for Classification and Regression." Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’12). Beijing, China: ACM Press, 2012, pp. 150–158.

[2] Lou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. "Accurate Intelligible Models with Pairwise Interactions." Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13) Chicago, Illinois, USA: ACM Press, 2013, pp. 623–631.

Introduced in R2021a