Why NaN values are found in score from kfoldPredict

2 Ansichten (letzte 30 Tage)
Yean Lim
Yean Lim am 17 Nov. 2020
Beantwortet: Shashank Gupta am 20 Nov. 2020
Names = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'};
isCategoricalPredictor = [false, false, true, false, true, false, false, false];
% Use tree learner
template = templateTree('NumVariablesToSample', 'all',... % to analyse predictor importance
'Reproducible',true, 'Surrogate','on', 'MaxNumSplits', maxNumSplits, 'MinLeafSize', minLeafSize); % Surrogate on to obtain measure of association
% optimizable variable does not accept
BestEnsembleMdl = fitcensemble(X_train,y_train,...
'Learners',template, ...
'Method', method, ...
'NumLearningCycles', numLearningCycles, ...
'Holdout', 0.2, ...
'LearnRate', learnRate, ...
'ScoreTransform','logit',... % transform scores to probabilistic estimates
'CategoricalPredictors', isCategoricalPredictor,...
'PredictorNames', Names);
[~, score] = kfoldPredict(BestEnsembleMdl);
Hi, I tried to run kfoldPredict using Classification Partitioned Ensemble produced by fitcensemble method.
When I run kfoldPredict, there are many NaN values found in the score variable returned by kfoldPredict method. Refered to the score variable in the attached mat file.
I am expecting to get real values from the score.
From example above, I use the following values:
learnRate = 0.9702
maxNumSplits = 16826
method = 'LogitBoost'
numLearningCycles = 2
minLeafSize = 1
I have saved X_train & y_train variables in the attached mat file. I have reduced the number of rows in X_train & y_train to 10 rows as a demonstration.
1) Why there are NaN values in the score?
2) What should I do to ensure that there are no NaN values in the score?
Thank you

Antworten (1)

Shashank Gupta
Shashank Gupta am 20 Nov. 2020
Hey Yean,
Yes, you get NaNs at the output score, those NaNs value index denotes the "HoldOut" fraction which is used as validation data. So depending on HoldOut value, kfoldPredict choose the index from the training sample which will be used as validation and only those sample index will get scores and rest become NaN. You can check by changing the HoldOut Value and see those NaN keeps on changing. Also one suggestion make sure the classes are distributed well while training and testing.
I hope this clear some confusion and enough for you to explore.
Cheers

Produkte


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by