Using fitcnb to train a naive bayes with some predictors that are dummy variables?

1 Ansicht (letzte 30 Tage)
So I am trying to train my naive bayes and some of my predictors are definitely not normal distributed, for they are filled with 0's and 1's. But when I use 'mn' it does not work at all and when I use 'mvmn' for these predictors then I get the following message: "Warning: You specified the 'mvmn' distribution for at least one predictor that does not appear in the 'CategoricalPredictors' list. 'CategoricalPredictors' will be updated to include all 'mvmn' predictors. ".
So I am clearly not understanding the trainer completely and don't know how to incorporate the fact that some predictors are dummy variables.
  2 Kommentare
Mahesh Taparia
Mahesh Taparia am 26 Mär. 2020
Hi
What is the dimension of your input data? Can you upload your sample code?
Don Mathis
Don Mathis am 2 Apr. 2020
'CategoricalPredictors' will be updated to include all 'mvmn' predictors. "
It sounds like it's working fine in that case. Binary dummy variables are categorical variables.

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Purvaja
Purvaja am 26 Aug. 2025
Let’s break down what that warning really means:
When you tell MATLAB to treat some features as 'mvmn' (multivariate multinomial), it expects them to be categorical or discrete predictors. MATLAB keeps track of which predictors are categorical using the 'CategoricalPredictors' property.
In your case, some features weren’t marked as categorical, so they weren’t in 'CategoricalPredictors'. But since you forced 'mvmn' as the distribution for all predictors, MATLAB automatically adds any predictor you set as 'mvmn' to the categorical list.
Here’s a small example:
% First column is continuous (height), second is categorical (dummy)
X = [1.80 0;
1.65 1;
1.75 0;
1.55 1];
Y = categorical({'fit'; 'fit'; 'unfit'; 'fit'});
% Forcing all predictors to mvmn
distNames = {'mvmn','mvmn'};
Mdl = fitcnb(X, Y, 'DistributionNames', distNames);
This example will give Warning like it gave it for you. Forcing continuous predictors to be categorical like this can reduce accuracy.
A better approach is to tell MATLAB explicitly which predictors are categorical, especially for mixed datasets:
distNames = {'normal','mvmn'}; % height ~ normal, dummy ~ categorical
Mdl = fitcnb(X, Y, 'DistributionNames', distNames, 'CategoricalPredictors', 2);
Here, we explicitly mark the 2nd column as categorical. MATLAB will automatically treat it as discrete and apply 'mvmn', while other columns are assumed continuous.
Now, about 'mn' (multinomial): it does not auto-convert predictors to categorical because it’s meant for count data. If you want 'mn' to be applied correctly, you must explicitly mark predictors as categorical.
For more details, check these resources:
Hope this helps you!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by