resubPredict

Classify observations using naive Bayes classifier

Syntax

``label = resubPredict(Mdl)``
``````[label,Posterior,Cost] = resubPredict(Mdl)``````

Description

example

````label = resubPredict(Mdl)` returns a vector of resubstitution predicted class labels (`label`) for the trained naive Bayes classifier `Mdl` using the predictor data `Mdl.X`.```

example

``````[label,Posterior,Cost] = resubPredict(Mdl)``` also returns the Posterior Probability (`Posterior`) and predicted (expected) Misclassification Cost (`Cost`) corresponding to the observations (rows) in `Mdl.X`.```

Examples

collapse all

Load the `fisheriris` data set. Create `X` as a numeric matrix that contains four petal measurements for 150 irises. Create `Y` as a cell array of character vectors that contains the corresponding iris species.

```load fisheriris X = meas; Y = species; rng('default') % for reproducibility```

Train a naive Bayes classifier using the predictors `X` and class labels `Y`. A recommended practice is to specify the class names. `fitcnb` assumes that each predictor is conditionally and normally distributed.

`Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})`
```Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods ```

`Mdl` is a trained `ClassificationNaiveBayes` classifier.

Predict the training sample labels.

`label = resubPredict(Mdl);`

Display the results for a random set of 10 observations.

```idx = randsample(size(X,1),10); table(Y(idx),label(idx),'VariableNames', ... {'TrueLabel','PredictedLabel'})```
```ans=10×2 table TrueLabel PredictedLabel ______________ ______________ {'virginica' } {'virginica' } {'setosa' } {'setosa' } {'virginica' } {'virginica' } {'versicolor'} {'versicolor'} {'virginica' } {'virginica' } {'versicolor'} {'versicolor'} {'virginica' } {'virginica' } {'setosa' } {'setosa' } {'virginica' } {'virginica' } {'setosa' } {'setosa' } ```

Create a confusion chart from the true labels `Y` and the predicted labels `label`.

`cm = confusionchart(Y,label);`

Estimate in-sample posterior probabilities and misclassification costs using a naive Bayes classifier.

Load the `fisheriris` data set. Create `X` as a numeric matrix that contains four petal measurements for 150 irises. Create `Y` as a cell array of character vectors that contains the corresponding iris species.

```load fisheriris X = meas; Y = species; rng('default') %for reproducibility```

Train a naive Bayes classifier using the predictors `X` and class labels `Y`. A recommended practice is to specify the class names. `fitcnb` assumes that each predictor is conditionally and normally distributed.

`Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});`

`Mdl` is a trained `ClassificationNaiveBayes` classifier.

Estimate the posterior probabilities and expected misclassification costs for the training data.

```[label,Posterior,MisclassCost] = resubPredict(Mdl); Mdl.ClassNames```
```ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } ```

Display the results for 10 randomly selected observations.

```idx = randsample(size(X,1),10); table(Y(idx),label(idx),Posterior(idx,:),MisclassCost(idx,:),'VariableNames', ... {'TrueLabel','PredictedLabel','PosteriorProbability','MisclassificationCost'})```
```ans=10×4 table TrueLabel PredictedLabel PosteriorProbability MisclassificationCost ______________ ______________ _________________________________________ ______________________________________ {'virginica' } {'virginica' } 6.2514e-269 1.1709e-09 1 1 1 1.1709e-09 {'setosa' } {'setosa' } 1 5.5339e-19 2.485e-25 5.5339e-19 1 1 {'virginica' } {'virginica' } 7.4191e-249 1.4481e-10 1 1 1 1.4481e-10 {'versicolor'} {'versicolor'} 3.4472e-62 0.99997 3.362e-05 1 3.362e-05 0.99997 {'virginica' } {'virginica' } 3.4268e-229 6.597e-09 1 1 1 6.597e-09 {'versicolor'} {'versicolor'} 6.0941e-77 0.9998 0.00019663 1 0.00019663 0.9998 {'virginica' } {'virginica' } 1.3467e-167 0.002187 0.99781 1 0.99781 0.002187 {'setosa' } {'setosa' } 1 1.5776e-15 5.7172e-24 1.5776e-15 1 1 {'virginica' } {'virginica' } 2.0116e-232 2.6206e-10 1 1 1 2.6206e-10 {'setosa' } {'setosa' } 1 1.8085e-17 1.9639e-24 1.8085e-17 1 1 ```

The order of the columns of `Posterior` and `MisclassCost` corresponds to the order of the classes in `Mdl.ClassNames`.

Input Arguments

collapse all

Full, trained naive Bayes classifier, specified as a `ClassificationNaiveBayes` model trained by `fitcnb`.

Output Arguments

collapse all

Predicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.

The predicted class labels have the following:

• Same data type as the observed class labels (`Mdl.Y`). (The software treats string arrays as cell arrays of character vectors.)

• Length equal to the number of rows of `Mdl.X`.

• Class yielding the lowest expected misclassification cost (`Cost`).

Class Posterior Probability, returned as a numeric matrix. `Posterior` has rows equal to the number of rows of `Mdl.X` and columns equal to the number of distinct classes in the training data (`size(Mdl.ClassNames,1)`).

`Posterior(j,k)` is the predicted posterior probability of class `k` (in class `Mdl.ClassNames(k)`) given the observation in row `j` of `Mdl.X`.

Expected Misclassification Cost, returned as a numeric matrix. `Cost` has rows equal to the number of rows of `Mdl.X` and columns equal to the number of distinct classes in the training data (`size(Mdl.ClassNames,1)`).

`Cost(j,k)` is the expected misclassification cost of the observation in row `j` of `Mdl.X` predicted into class `k` (in class `Mdl.ClassNames(k)`).

collapse all

Misclassification Cost

A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.

There are two types of misclassification costs: true and expected. Let K be the number of classes.

• True misclassification cost — A K-by-K matrix, where element (i,j) indicates the misclassification cost of predicting an observation into class j if its true class is i. The software stores the misclassification cost in the property `Mdl.Cost`, and uses it in computations. By default, `Mdl.Cost(i,j)` = 1 if `i``j`, and `Mdl.Cost(i,j)` = 0 if `i` = `j`. In other words, the cost is `0` for correct classification and `1` for any incorrect classification.

• Expected misclassification cost — A K-dimensional vector, where element k is the weighted average misclassification cost of classifying an observation into class k, weighted by the class posterior probabilities.

`${c}_{k}=\sum _{j=1}^{K}\stackrel{^}{P}\left(Y=j|{x}_{1},...,{x}_{P}\right)Cos{t}_{jk}.$`

In other words, the software classifies observations to the class corresponding with the lowest expected misclassification cost.

Posterior Probability

The posterior probability is the probability that an observation belongs in a particular class, given the data.

For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is

`$\stackrel{^}{P}\left(Y=k|{x}_{1},..,{x}_{P}\right)=\frac{P\left({X}_{1},...,{X}_{P}|y=k\right)\pi \left(Y=k\right)}{P\left({X}_{1},...,{X}_{P}\right)},$`

where:

• $P\left({X}_{1},...,{X}_{P}|y=k\right)$ is the conditional joint density of the predictors given they are in class k. `Mdl.DistributionNames` stores the distribution names of the predictors.

• π(Y = k) is the class prior probability distribution. `Mdl.Prior` stores the prior distribution.

• $P\left({X}_{1},..,{X}_{P}\right)$ is the joint density of the predictors. The classes are discrete, so $P\left({X}_{1},...,{X}_{P}\right)=\sum _{k=1}^{K}P\left({X}_{1},...,{X}_{P}|y=k\right)\pi \left(Y=k\right).$

Prior Probability

The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

Introduced in R2014b