Estimate the test sample weighted edge (the weighted margin average) of a naive Bayes classifier. The test sample edge is the average test sample difference between the estimated posterior probability for the predicted class and the posterior probability for the class with the next lowest posterior probability. The weighted sample edge estimates the margin average when the software assigns a weight to each observation.

Load the `fisheriris`

data set. Create `X`

as a numeric matrix that contains four petal measurements for 150 irises. Create `Y`

as a cell array of character vectors that contains the corresponding iris species.

Suppose that some of the measurements are lower quality because they were measured with older technology. To simulate this effect, add noise to a random subset of 20 measurements.

Randomly partition observations into a training set and a test set with stratification, using the class information in `Y`

. Specify a 30% holdout sample for testing.

Extract the training and test indices.

Specify the training and test data sets.

Train a naive Bayes classifier using the predictors `XTrain`

and class labels `YTrain`

. A recommended practice is to specify the class names. `fitcnb`

assumes that each predictor is conditionally and normally distributed.

`Mdl`

is a trained `ClassificationNaiveBayes`

classifier.

Estimate the test sample edge.

The average margin is approximately 0.59.

One way to reduce the effect of the noisy measurements is to assign them less weight than the other observations. Define a weight vector that gives the better quality observations twice the weight of the other observations.

Train a naive Bayes classifier using the predictors `XTrain`

, class labels `YTrain`

, and weights `weightsTrain`

.

`Mdl_W`

is a trained `ClassificationNaiveBayes`

classifier.

Estimate the test sample weighted edge using the weighting scheme.

The weighted average margin is approximately 0.69. This result indicates that, on average, the weighted classifier labels predictors with higher confidence than the noise corrupted predictors.