How is predicted value calculated when using kfoldpredict with regression?

Question

0 Stimmen

When using kfoldpredict for cross validated model, what determines the predicted value?

I conducted experiment and it seems that the value is selected randomly accross the folds. Is this correct? I was assuming the result could be perhaps average of all the folds.

For example, if using regular predict function with all models when K = 5, the results are 17.25, 16.92, 15.5, 17.25, 18 and the kfoldpredict result is 15.5

Then with next sample, the results are 13.88, 14.58, 14,67, 13.71 and 14.64 and the kfoldpredict result is 13.88.

Code example

clear;
load carsmall
X = [Cylinders Displacement Horsepower Weight];
Y = MPG;
% Remove NaN values
X2 = X;
Y2 = Y;
X2(isnan(Y),:) = [];
Y2(isnan(Y)) = [];
rng('default') % For reproducibility
k = 5;
CVMdl = fitrtree(X,Y,"KFold", k);
% Predict
yfit = kfoldPredict(CVMdl);
mse = mean((yfit - CVMdl.Y).^2)
% Predict fold by fold
for i = 1:k
    yhat_kfold(:,i) = predict(CVMdl.Trained{i}, X2);
end
% Create table for analysis
T = table(yhat_kfold, yfit, Y2);

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Kausthub am 6 Sep. 2023

0 Stimmen

Hi Martti Ilvesmäki,

I understand that you would like to know how the predicted value is calculated when using kfoldPredict with regression ( https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedmodel.kfoldpredict.html ). Also, if the predicted value is randomly selected, why does it do so and why not consider the average across all the folds?

The kfoldPredict does not select a fold randomly to predict. The response for every observation (an input or X) is computed by using the model trained without this observation in the training set.

In the example, 17.25, 16.92, 15.5, 17.25, 18 and the kfoldPredict result is 15.5. This is because the observation (X) corresponding to 15.5 was not included in the training set of the model in fold 3. This is the reason why the predicted value of the model trained in fold 3 was considered. Similarly in the next sample, the results are 13.88, 14.58, 14,67, 13.71, and 14.64 and the kfoldPredict result is 13.88 because the observation (X) corresponding to 13.88 was not present in the training set of the model in fold 1.

The main purpose of k-Fold Cross-Validation is to choose the best model, so taking an average of the predicted values does not make much sense, also even if we were to consider averages it would introduce bias and any model predicting extreme values can mess up the mean square error (MSE) entirely. Taking average of the accuracies (not the average of predicted values) of the models can give a better overall performance understanding of the model.

Here are some references that might be useful for you to better understand k-Fold cross-validation.

Hope this helps and clarifies your queries regarding kFoldPredict!

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

How is predicted value calculated when using kfoldpredict with regression?

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

How is predicted value calculated when using kfoldpredict with regression?

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden