Compute R-square, RMSE, correlation, and sample mean error of predicted and observed LGDs

AccMeasure = modelAccuracy(lgdModel,data) computes the R-square, root mean square error (RMSE), correlation, and sample mean error of observed vs. predicted loss given default (LGD) data. modelAccuracy supports comparison against a reference model and also supports different correlation types. By default, modelAccuracy computes the metrics in the LGD scale. You can use the ModelLevel name-value pair argument to compute metrics using the underlying model's transformed scale.

[AccMeasure,AccData] = modelAccuracy(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in the previous syntax.

This example shows how to use fitLGDModel to fit data with a Regression model and then use modelAccuracy to compute the R-Square, RMSE, correlation, and sample mean error of predicted and observed LGDs.

Load the loss given default data.

ans=8×4 table
LTV        Age         Type           LGD
_______    _______    ___________    _________

0.89101    0.39716    residential     0.032659
0.70176     2.0939    residential      0.43564
0.72078     2.7948    residential    0.0064766
0.37013      1.237    residential     0.007947
0.36492     2.5818    residential            0
0.796     1.5957    residential      0.14572
0.60203     1.1599    residential     0.025688
0.92005    0.50253    investment      0.063182

Partition Data

Separate the data into training and test partitions.

rng('default'); % for reproducibility
NumObs = height(data);

c = cvpartition(NumObs,'HoldOut',0.4);
TrainingInd = training(c);
TestInd = test(c);

Create Regression LGD Model

Use fitLGDModel to create a Regression model using training data.

lgdModel = fitLGDModel(data(TrainingInd,:),'regression');
disp(lgdModel)
Regression with properties:

ResponseTransform: "logit"
BoundaryTolerance: 1.0000e-05
ModelID: "Regression"
Description: ""
UnderlyingModel: [1x1 classreg.regr.CompactLinearModel]
PredictorVars: ["LTV"    "Age"    "Type"]
ResponseVar: "LGD"

Display the underlying model.

disp(lgdModel.UnderlyingModel)
Compact linear regression model:
LGD_logit ~ 1 + LTV + Age + Type

Estimated Coefficients:
Estimate       SE        tStat       pValue
________    ________    _______    __________

(Intercept)        -4.7549      0.36041    -13.193    3.0997e-38
LTV                 2.8565      0.41777     6.8377    1.0531e-11
Age                -1.5397     0.085716    -17.963    3.3172e-67
Type_investment     1.4358       0.2475     5.8012     7.587e-09

Number of observations: 2093, Error degrees of freedom: 2089
Root Mean Squared Error: 4.24
F-statistic vs. constant model: 181, p-value = 2.42e-104

Compute R-Square, RMSE, Correlation, and Sample Mean Error of Predicted and Observed LGDs

Use modelAccuracy to compute the RSquared, RMSE, Correlation, and SampleMeanError of the predicted and observed LGDs for the test data set.

[AccMeasure,AccData] = modelAccuracy(lgdModel,data(TestInd,:))
AccMeasure=1×4 table
RSquared     RMSE      Correlation    SampleMeanError
________    _______    ___________    _______________

Regression    0.070867    0.25988      0.26621          0.10759

AccData=1394×3 table
Observed     Predicted_Regression    Residuals_Regression
_________    ____________________    ____________________

0.0064766         0.00091169               0.0055649
0.007947          0.0036758               0.0042713
0.063182            0.18774                -0.12456
0          0.0010877              -0.0010877
0.10904           0.011213                0.097823
0           0.041992               -0.041992
0.89463           0.052947                 0.84168
0         3.7188e-06             -3.7188e-06
0.072437          0.0090124                0.063425
0.036006           0.023928                0.012078
0          0.0034833              -0.0034833
0.39549          0.0065253                 0.38896
0.057675           0.071956               -0.014281
0.014439          0.0061499                0.008289
0          0.0012183              -0.0012183
0          0.0019828              -0.0019828
Generate a scatter plot of predicted and observed LGDs using modelAccuracyPlot.

modelAccuracyPlot(lgdModel,data(TestInd,:),'ModelLevel',"underlying") This example shows how to use fitLGDModel to fit data with a Tobit model and then use modelAccuracy to compute R-Square, RMSE, correlation, and sample mean error of predicted and observed LGDs.

Load the loss given default data.

ans=8×4 table
LTV        Age         Type           LGD
_______    _______    ___________    _________

0.89101    0.39716    residential     0.032659
0.70176     2.0939    residential      0.43564
0.72078     2.7948    residential    0.0064766
0.37013      1.237    residential     0.007947
0.36492     2.5818    residential            0
0.796     1.5957    residential      0.14572
0.60203     1.1599    residential     0.025688
0.92005    0.50253    investment      0.063182

Partition Data

Separate the data into training and test partitions.

rng('default'); % for reproducibility
NumObs = height(data);

c = cvpartition(NumObs,'HoldOut',0.4);
TrainingInd = training(c);
TestInd = test(c);

Create Tobit LGD Model

Use fitLGDModel to create a Tobit model using training data.

lgdModel = fitLGDModel(data(TrainingInd,:),'tobit');
disp(lgdModel)
Tobit with properties:

CensoringSide: "both"
LeftLimit: 0
RightLimit: 1
ModelID: "Tobit"
Description: ""
UnderlyingModel: [1x1 risk.internal.credit.TobitModel]
PredictorVars: ["LTV"    "Age"    "Type"]
ResponseVar: "LGD"

Display the underlying model.

disp(lgdModel.UnderlyingModel)
Tobit regression model:
LGD = max(0,min(Y*,1))
Y* ~ 1 + LTV + Age + Type

Estimated coefficients:
Estimate        SE         tStat       pValue
_________    _________    _______    __________

(Intercept)         0.058257      0.02728     2.1355      0.032833
LTV                  0.20126     0.031403     6.4088    1.8072e-10
Age                -0.095407    0.0072398    -13.178             0
Type_investment      0.10208     0.018048     5.6561     1.761e-08
(Sigma)              0.29288    0.0057086     51.304             0

Number of observations: 2093
Number of left-censored observations: 547
Number of uncensored observations: 1521
Number of right-censored observations: 25
Log-likelihood: -698.383

Compute R-Square, RMSE, Correlation, and Sample Mean Error of Predicted and Observed LGDs

Use modelAccuracy to compute RSquared, RMSE, Correlation, and SampleMeanError of predicted and observed LGDs for the test data set.

[AccMeasure,AccData] = modelAccuracy(lgdModel,data(TestInd,:),'CorrelationType',"kendall")
AccMeasure=1×4 table
RSquared     RMSE      Correlation    SampleMeanError
________    _______    ___________    _______________

Tobit    0.08527     0.23712      0.29964         -0.034412

AccData=1394×3 table
Observed     Predicted_Tobit    Residuals_Tobit
_________    _______________    _______________

0.0064766       0.087889           -0.081412
0.007947        0.12432            -0.11638
0.063182        0.32043            -0.25724
0       0.093354           -0.093354
0.10904        0.16718           -0.058144
0        0.22382            -0.22382
0.89463        0.23695             0.65768
0       0.010234           -0.010234
0.072437         0.1592           -0.086761
0.036006        0.19893            -0.16292
0        0.12764            -0.12764
0.39549        0.14568              0.2498
0.057675        0.26181            -0.20413
0.014439        0.14483            -0.13039
0       0.094123           -0.094123
0        0.10944            -0.10944
Generate a scatter plot of the predicted and observed LGDs using modelAccuracyPlot.

modelAccuracyPlot(lgdModel,data(TestInd,:)) Input Arguments

Loss given default model, specified as a previously created Regression or Tobit object using fitLGDModel.

Data Types: object

Data, specified as a NumRows-by-NumCols table with predictor and response values. The variable names and data types must be consistent with the underlying model.

Data Types: table

Name-Value Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: [AccMeasure,AccData] = modelAccuracy(lgdModel,data(TestInd,:),'DataID','Testing','CorrelationType','spearman')

Correlation type, specified as the comma-separated pair consisting of 'CorrelationType' and a character vector or string.

Data Types: char | string

Data set identifier, specified as the comma-separated pair consisting of 'DataID' and a character vector or string. The DataID is included in the output for reporting purposes.

Data Types: char | string

Model level, specified as the comma-separated pair consisting of 'ModelLevel' and a character vector or string.

• 'top' — The accuracy metrics are computed in the LGD scale at the top model level.

• 'underlying' — For a Regression model only, the metrics are computed in the underlying model's transformed scale. The metrics are computed on the transformed LGD data.

Note

ModelLevel has no effect for a Tobit model because there is no response transformation.

Data Types: char | string

LGD values predicted for data by the reference model, specified as the comma-separated pair consisting of 'ReferenceLGD' and a NumRows-by-1 numeric vector. The modelAccuracy output information is reported for both the lgdModel object and the reference model.

Data Types: double

Identifier for the reference model, specified as the comma-separated pair consisting of 'ReferenceID' and a character vector or string. 'ReferenceID' is used in the modelAccuracy output for reporting purposes.

Data Types: char | string

Output Arguments

Accuracy measure, returned as a table with columns 'RSquared', 'RMSE', 'Correlation', and 'SampleMeanError'. AccMeasure has one row if only the lgdModel accuracy is measured and it has two rows if reference model information is given. The row names of AccMeasure report the model ID and data ID (if provided).

Accuracy data, returned as a table with observed LGD values, predicted LGD values, and residuals (observed minus predicted). Additional columns for predicted and residual values are included for the reference model, if provided. The ModelID and ReferenceID labels are appended in the column names.

Model Accuracy

Model accuracy measures the accuracy of the predicted probability of LGD values using different metrics.

• R-squared — To compute the R-squared metric, modelAccuracy fits a linear regression of the observed LGD values against the predicted LGD values

$LG{D}_{obs}=a+b\ast LG{D}_{pred}+\epsilon$

The R-square of this regression is reported. For more information, see Coefficient of Determination (R-Squared).

• RMSE — To compute the root mean square error (RMSE), modelAccuracy uses the following formula where N is the number of observations:

$RMSE=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}\left(LG{D}_{i}^{obs}-LG{D}_{i}^{pred}{\right)}^{2}}$

• Correlation — This is the correlation between the observed and predicted LGD:

$corr\left(LG{D}_{obs},LG{D}_{pred}\right)$

$SampleMeanError=\frac{1}{N}{\sum }_{i=1}^{N}\left(LG{D}_{i}^{obs}-LG{D}_{i}^{pred}\right)$