Hauptinhalt

setmodel

Set model predictors and coefficients

Description

sc = setmodel(sc,ModelPredictors,ModelCoefficients) sets the predictors and coefficients of a linear logistic regression model fitted outside the creditscorecard object and returns an updated creditscorecard object. The predictors and coefficients are used for the computation of scorecard points. Use setmodel in lieu of fitmodel, which fits a linear logistic regression model, because setmodel offers increased flexibility. For example, when a model fitted with fitmodel needs to be modified, you can use setmodel. For more information, see Workflows for Using setmodel.

Note

When using setmodel, the following assumptions apply:

  • The model coefficients correspond to a linear logistic regression model (where only linear terms are included in the model and there are no interactions or any other higher-order terms).

  • The model was previously fitted using Weight of Evidence (WOE) data with the response mapped so that ‘Good’ is 1 and ‘Bad’ is 0.

example

Examples

collapse all

This example shows how to use setmodel to make modifications to a logistic regression model initially fitted using the fitmodel function, and then set the new logistic regression model predictors and coefficients back into the creditscorecard object.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'TmAtAddress'  'CustIncome'  'TmWBank'  'AMBalance'  'UtilRate'}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'}
                     Data: [1200×11 table]

Perform automatic binning.

sc = autobinning(sc);

The standard workflow is to use the fitmodel function to fit a logistic regression model using a stepwise method. However, fitmodel only supports limited options regarding the stepwise procedure. You can use the optional mdl output argument from fitmodel to get a copy of the fitted GeneralizedLinearModel object, to later modify.

[sc,mdl] = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    logit(status) ~ 1 + CustAge + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Suppose you want to include, or "force," the predictor 'UtilRate' in the logistic regression model, even though the stepwise method did not include it in the fitted model. You can add 'UtilRate' to the logistic regression model using the GeneralizedLinearModel object mdl directly.

mdl = mdl.addTerms('UtilRate')
mdl = 
Generalized linear regression model:
    logit(status) ~ 1 + CustAge + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance + UtilRate
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE        tStat        pValue  
                   ________    ________    ________    __________

    (Intercept)     0.70239    0.064001      10.975    5.0538e-28
    CustAge         0.60843     0.24936        2.44      0.014687
    ResStatus        1.3773      0.6529      2.1096      0.034896
    EmpStatus       0.88556     0.29303      3.0221     0.0025103
    CustIncome      0.70146      0.2186      3.2089     0.0013324
    TmWBank          1.1071     0.23307      4.7503    2.0316e-06
    OtherCC          1.0882     0.52918      2.0563       0.03975
    AMBalance        1.0413     0.36557      2.8483      0.004395
    UtilRate       0.013157     0.60864    0.021618       0.98275


1200 observations, 1191 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 5.26e-16

Use setmodel to update the model predictors and model coefficients in the creditscorecard object. The ModelPredictors input argument does not explicitly include a string for the intercept. However, the ModelCoefficients input argument does have the intercept information as its first element.

ModelPredictors = mdl.PredictorNames
ModelPredictors = 8×1 cell
    {'CustAge'   }
    {'ResStatus' }
    {'EmpStatus' }
    {'CustIncome'}
    {'TmWBank'   }
    {'OtherCC'   }
    {'AMBalance' }
    {'UtilRate'  }

ModelCoefficients = mdl.Coefficients.Estimate
ModelCoefficients = 9×1

    0.7024
    0.6084
    1.3773
    0.8856
    0.7015
    1.1071
    1.0882
    1.0413
    0.0132

sc = setmodel(sc,ModelPredictors,ModelCoefficients);

Verify that 'UtilRate' is part of the scorecard predictors by displaying the scorecard points.

pi = displaypoints(sc)
pi=41×3 table
      Predictors            Bin            Points  
    ______________    ________________    _________

    {'CustAge'   }    {'[-Inf,33)'   }     -0.17152
    {'CustAge'   }    {'[33,37)'     }     -0.15295
    {'CustAge'   }    {'[37,40)'     }    -0.072892
    {'CustAge'   }    {'[40,46)'     }     0.033856
    {'CustAge'   }    {'[46,48)'     }      0.20193
    {'CustAge'   }    {'[48,58)'     }      0.21787
    {'CustAge'   }    {'[58,Inf]'    }      0.46652
    {'CustAge'   }    {'<missing>'   }          NaN
    {'ResStatus' }    {'Tenant'      }    -0.043826
    {'ResStatus' }    {'Home Owner'  }      0.11442
    {'ResStatus' }    {'Other'       }      0.36394
    {'ResStatus' }    {'<missing>'   }          NaN
    {'EmpStatus' }    {'Unknown'     }    -0.088843
    {'EmpStatus' }    {'Employed'    }      0.30193
    {'EmpStatus' }    {'<missing>'   }          NaN
    {'CustIncome'}    {'[-Inf,29000)'}     -0.46956
      ⋮

This example shows how to use setmodel to fit a logistic regression model directly, without using the fitmodel function, and then set the new model predictors and coefficients back into the creditscorecard object. This approach gives more flexibility regarding options to control the stepwise procedure. This example fits a logistic regression model with a nondefault value for the 'PEnter' parameter, the criterion to admit a new predictor in the logistic regression model during the stepwise procedure.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'TmAtAddress'  'CustIncome'  'TmWBank'  'AMBalance'  'UtilRate'}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'}
                     Data: [1200×11 table]

Perform automatic binning.

sc = autobinning(sc);

The logistic regression model needs to be fit with Weight of Evidence (WOE) data. The WOE transformation is a special case of binning, since the data first needs to be binned, and then the binned information is mapped to the corresponding WOE values. This transformation is done using the bindata function. bindata has an argument that prepares the data for the model fitting step. By setting the bindata name-value pair argument for 'OutputType' to WOEModelInput':

  • All predictors are converted to WOE values.

  • The output contains only predictors and response (no 'IDVar' or any unused variables).

  • Predictors with infinite or undefined (NaN) WOE values are discarded.

  • The response values are mapped so that "Good" is 1 and "Bad" is 0 (this implies that higher unscaled scores correspond to better, less risky customers).

bd = bindata(sc,'OutputType','WOEModelInput');

For example, the first ten rows in the original data for the variables 'CustAge', 'ResStatus', 'CustIncome', and 'status' (response variable) look like this:

data(1:10,{'CustAge' 'ResStatus' 'CustIncome' 'status'})
ans=10×4 table
    CustAge    ResStatus     CustIncome    status
    _______    __________    __________    ______

      53       Tenant          50000         0   
      61       Home Owner      52000         0   
      47       Tenant          37000         0   
      50       Home Owner      53000         0   
      68       Home Owner      53000         0   
      65       Home Owner      48000         0   
      34       Home Owner      32000         1   
      50       Other           51000         0   
      50       Tenant          52000         1   
      49       Home Owner      53000         1   

Here is how the same ten rows look after calling bindata with the name-value pair argument 'OutputType' set to 'WOEModelInput':

bd(1:10,{'CustAge' 'ResStatus' 'CustIncome' 'status'})
ans=10×4 table
    CustAge     ResStatus    CustIncome    status
    ________    _________    __________    ______

     0.21378    -0.095564      0.47972       1   
     0.62245     0.019329      0.47972       1   
     0.18758    -0.095564    -0.026696       1   
     0.21378     0.019329      0.47972       1   
     0.62245     0.019329      0.47972       1   
     0.62245     0.019329      0.47972       1   
    -0.39568     0.019329     -0.29217       0   
     0.21378      0.20049      0.47972       1   
     0.21378    -0.095564      0.47972       0   
     0.21378     0.019329      0.47972       0   

Fit a logistic linear regression model using a stepwise method with the Statistics and Machine Learning Toolbox™ function stepwiseglm, but use a nondefault value for the 'PEnter' and 'PRemove' optional arguments. The predictors 'ResStatus' and 'OtherCC' would normally be included in the logistic linear regression model using default options for the stepwise procedure.

mdl = stepwiseglm(bd,'constant','Distribution','binomial',...
'Upper','linear','PEnter',0.025,'PRemove',0.05)
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
mdl = 
Generalized linear regression model:
    logit(status) ~ 1 + CustAge + EmpStatus + CustIncome + TmWBank + AMBalance
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70263     0.063759     11.02    3.0544e-28
    CustAge        0.57265       0.2482    2.3072      0.021043
    EmpStatus      0.88356      0.29193    3.0266      0.002473
    CustIncome     0.70399      0.21781    3.2321      0.001229
    TmWBank            1.1      0.23185    4.7443    2.0924e-06
    AMBalance       1.0313      0.32007    3.2221     0.0012724


1200 observations, 1194 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 81.4, p-value = 4.18e-16

Use setmodel to update the model predictors and model coefficients in the creditscorecard object. The ModelPredictors input argument does not explicitly include a string for the intercept. However, the ModelCoefficients input argument does have the intercept information as its first element.

ModelPredictors = mdl.PredictorNames
ModelPredictors = 5×1 cell
    {'CustAge'   }
    {'EmpStatus' }
    {'CustIncome'}
    {'TmWBank'   }
    {'AMBalance' }

ModelCoefficients = mdl.Coefficients.Estimate
ModelCoefficients = 6×1

    0.7026
    0.5726
    0.8836
    0.7040
    1.1000
    1.0313

sc = setmodel(sc,ModelPredictors,ModelCoefficients);

Verify that the desired model predictors are part of the scorecard predictors by displaying the scorecard points.

pi = displaypoints(sc)
pi=30×3 table
      Predictors             Bin            Points  
    ______________    _________________    _________

    {'CustAge'   }    {'[-Inf,33)'    }     -0.10354
    {'CustAge'   }    {'[33,37)'      }    -0.086059
    {'CustAge'   }    {'[37,40)'      }    -0.010713
    {'CustAge'   }    {'[40,46)'      }     0.089757
    {'CustAge'   }    {'[46,48)'      }      0.24794
    {'CustAge'   }    {'[48,58)'      }      0.26294
    {'CustAge'   }    {'[58,Inf]'     }      0.49697
    {'CustAge'   }    {'<missing>'    }          NaN
    {'EmpStatus' }    {'Unknown'      }    -0.035716
    {'EmpStatus' }    {'Employed'     }      0.35417
    {'EmpStatus' }    {'<missing>'    }          NaN
    {'CustIncome'}    {'[-Inf,29000)' }     -0.41884
    {'CustIncome'}    {'[29000,33000)'}    -0.065161
    {'CustIncome'}    {'[33000,35000)'}     0.092353
    {'CustIncome'}    {'[35000,40000)'}      0.12173
    {'CustIncome'}    {'[40000,42000)'}      0.13259
      ⋮

Input Arguments

collapse all

Credit scorecard model, specified as a creditscorecard object. Use creditscorecard to create a creditscorecard object.

Predictor names included in the fitted model, specified as a cell array of character vectors as {'PredictorName1','PredictorName2',...}. The predictor names must match predictor variable names in the creditscorecard object.

Note

Do not include a character vector for the constant term in ModelPredictors, setmodel internally handles the '(Intercept)' term based on the number of model coefficients (see ModelCoefficients).

Data Types: cell

Model coefficients corresponding to the model predictors, specified as a numeric array of model coefficients, [coeff1,coeff2,..]. If N is the number of predictor names provided in ModelPredictors, the size of ModelCoefficients can be N or N+1. If ModelCoefficients has N+1 elements, then the first coefficient is used as the '(Intercept)' of the fitted model. Otherwise, the '(Intercept)' is set to 0.

Data Types: double

Output Arguments

collapse all

Credit scorecard model, returned as an updated creditscorecard object. The creditscorecard object contains information about the model predictors and coefficients of the fitted model. For more information on using the creditscorecard object, see creditscorecard.

More About

collapse all

References

[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Version History

Introduced in R2014b