screenpredictors
Screen credit scorecard predictors for predictive value
Description
metric_table
= screenpredictors(data)metric_table, a MATLAB® table containing the calculated values for several measures of
                    predictive power for each predictor variable in the data. 
Use the screenpredictors function as a preprocessing step
                in the Credit Scorecard Modeling Workflow to
                reduce the number of predictor variables before you create the credit scorecard
                using the creditscorecard function from
                    Financial Toolbox™. In addition, you can use Threshold
                    Predictors from Risk Management Toolbox™to interactively set credit scorecard predictor thresholds using the
                output from screenpredictors
                before you create the credit scorecard using the creditscorecard.
metric_table
= screenpredictors(___,Name,Value)
Examples
Reduce the number of predictor variables by screening predictors before you create a credit scorecard.
Use the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).  
load CreditCardData.matDefine 'IDVar' and 'ResponseVar'.
idvar = 'CustID'; responsevar = 'status';
Use screenpredictors to calculate the predictor screening metrics. The function returns a table containing the metrics values. Each table row corresponds to a predictor from the input table data. 
metric_table = screenpredictors(data,'IDVar', idvar,'ResponseVar', responsevar)
metric_table=9×7 table
                   InfoValue    AccuracyRatio     AUROC     Entropy     Gini      Chi2PValue    PercentMissing
                   _________    _____________    _______    _______    _______    __________    ______________
    CustAge          0.18863       0.17095       0.58547    0.88729    0.42626    0.00074524          0       
    TmWBank          0.15719       0.13612       0.56806    0.89167    0.42864     0.0054591          0       
    CustIncome       0.15572       0.17758       0.58879      0.891    0.42731     0.0018428          0       
    TmAtAddress     0.094574      0.010421       0.50521    0.90089    0.43377         0.182          0       
    UtilRate        0.075086      0.035914       0.51796    0.90405    0.43575       0.45546          0       
    AMBalance        0.07159      0.087142       0.54357    0.90446    0.43592       0.48528          0       
    EmpStatus       0.048038       0.10886       0.55443    0.90814     0.4381    0.00037823          0       
    OtherCC         0.014301      0.044459       0.52223    0.91347    0.44132      0.047616          0       
    ResStatus      0.0097738       0.05039        0.5252    0.91422    0.44182       0.27875          0       
metric_table = sortrows(metric_table,'AccuracyRatio','descend')
metric_table=9×7 table
                   InfoValue    AccuracyRatio     AUROC     Entropy     Gini      Chi2PValue    PercentMissing
                   _________    _____________    _______    _______    _______    __________    ______________
    CustIncome       0.15572       0.17758       0.58879      0.891    0.42731     0.0018428          0       
    CustAge          0.18863       0.17095       0.58547    0.88729    0.42626    0.00074524          0       
    TmWBank          0.15719       0.13612       0.56806    0.89167    0.42864     0.0054591          0       
    EmpStatus       0.048038       0.10886       0.55443    0.90814     0.4381    0.00037823          0       
    AMBalance        0.07159      0.087142       0.54357    0.90446    0.43592       0.48528          0       
    ResStatus      0.0097738       0.05039        0.5252    0.91422    0.44182       0.27875          0       
    OtherCC         0.014301      0.044459       0.52223    0.91347    0.44132      0.047616          0       
    UtilRate        0.075086      0.035914       0.51796    0.90405    0.43575       0.45546          0       
    TmAtAddress     0.094574      0.010421       0.50521    0.90089    0.43377         0.182          0       
Based on the AccuracyRatio metric, select the top predictors to use when you create the creditscorecard object.
varlist = metric_table.Row(metric_table.AccuracyRatio > 0.09)
varlist = 4×1 cell
    {'CustIncome'}
    {'CustAge'   }
    {'TmWBank'   }
    {'EmpStatus' }
Use creditscorecard to create a createscorecard object based on only the "screened" predictors.
sc = creditscorecard(data,'IDVar', idvar,'ResponseVar', responsevar, 'PredictorVars', varlist)
sc = 
  creditscorecard with properties:
                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'CustIncome'  'TmWBank'}
    CategoricalPredictors: {'EmpStatus'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'EmpStatus'  'CustIncome'  'TmWBank'}
                     Data: [1200×11 table]
Input Arguments
Data for the creditscorecard object, specified as a
                                MATLAB table, tall table, or tall timetable, where each column of
                            data can be any one of the following data types:
- Numeric 
- Logical 
- Cell array of character vectors 
- Character array 
- Categorical 
- String 
Data Types: table
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
      Before R2021a, use commas to separate each name and value, and enclose 
      Name in quotes.
    
Example: metric_table =
                        screenpredictors(data,'IDVar','CustAge','ResponseVar','status','PredictorVars',{'CustID','CustIncome'})
Name of identifier variable, specified as the comma-separated pair
                                consisting of 'IDVar' and a case-sensitive
                                character vector. The 'IDVar' data can be ordinal
                                numbers or Social Security numbers. By specifying
                                    'IDVar', you can omit the identifier variable
                                from the predictor variables easily.
Data Types: char
Response variable name, specified as the comma-separated pair
                            consisting of 'ResponseVar' and a case-sensitive
                            character vector. The response variable data must be binary, the
                                "Good" or "Bad" indicator. 
If not specified, 'ResponseVar' is set to the last
                            column of the input data by default.
Data Types: char
Names of predictor variables, specified as the comma-separated
                                pair consisting of 'PredictorVars' and a
                                case-sensitive cell array of character vectors or string array. By
                                default, when you create a creditscorecard
                                object, all variables are predictors except for
                                    IDVar and ResponseVar.
                                Any name you specify using 'PredictorVars' must
                                differ from the IDVar and
                                    ResponseVar names.
Data Types: cell | string
Name of weights variable, specified as the comma-separated pair
                                consisting of 'WeightsVar' and a case-sensitive
                                character vector to indicate which column name in the
                                    data table contains the row weights. 
If you do not specify 'WeightsVar' when you
                                create a creditscorecard object, then the
                                function uses the unit weights as the observation weights.
Data Types: char
Number of (equal frequency) bins for numeric predictors, specified
                                as the comma-separated pair consisting of
                                    'NumBins' and a scalar numeric. 
Data Types: double
Small shift in frequency tables that contain zero entries,
                                specified as the comma-separated pair consisting of
                                    'FrequencyShift' and a scalar numeric with a
                                value between 0 and 1.
If the frequency table of a predictor contains any "pure" bins
                                (containing all goods or all bads) after you bin the data using
                                    autobinning, then
                                the function adds the 'FrequencyShift' value to
                                all bins in the table. To avoid any perturbation, set
                                    'FrequencyShift' to
                                0.
Data Types: double
Output Arguments
Calculated values for the predictor screening metrics, returned as table. Each table row corresponds to a predictor from the input table data. The table columns contain calculated values for the following metrics:
- 'InfoValue'— Information value. This metric measures the strength of a predictor in the fitting model by determining the deviation between the distributions of- "Goods"and- "Bads".
- 'AccuracyRatio'— Accuracy ratio.
- 'AUROC'— Area under the ROC curve.
- 'Entropy'— Entropy. This metric measures the level of unpredictability in the bins. You can use the entropy metric to validate a risk model.
- 'Gini'— Gini. This metric measures the statistical dispersion or inequality within a sample of data.
- 'Chi2PValue'— Chi-square p-value. This metric is computed from the chi-square metric and is a measure of the statistical difference and independence between groups.
- 'PercentMissing'— Percentage of missing values in the predictor. This metric is expressed in decimal form.
Extended Capabilities
This function supports input data that is specified as a
                    tall column vector, a tall table, or a tall timetable. Note that the output for
                    numeric predictors might be slightly different when using a tall array.
                    Categorical predictors return the same results for tables and tall arrays. For
                    more information, see tall and Tall Arrays.
Version History
Introduced in R2019a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Website auswählen
Wählen Sie eine Website aus, um übersetzte Inhalte (sofern verfügbar) sowie lokale Veranstaltungen und Angebote anzuzeigen. Auf der Grundlage Ihres Standorts empfehlen wir Ihnen die folgende Auswahl: .
Sie können auch eine Website aus der folgenden Liste auswählen:
So erhalten Sie die bestmögliche Leistung auf der Website
Wählen Sie für die bestmögliche Website-Leistung die Website für China (auf Chinesisch oder Englisch). Andere landesspezifische Websites von MathWorks sind für Besuche von Ihrem Standort aus nicht optimiert.
Amerika
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)