Main Content

resubPredict

Predict resubstitution response of tree

Description

example

Yfit = resubPredict(tree) returns the responses tree predicts for the data tree.X. Yfit is the predictions of tree on the data that fitrtree used to create tree.

Yfit = resubPredict(tree,Subtrees=subtrees) also prunes tree to the level specified by subtrees, before predicting responses.

Before R2021a, use the equivalent syntax Yfit = resubPredict(tree,"Subtrees",subtrees).

[Yfit,node] = resubPredict(___) also returns the node numbers of tree for the resubstituted data, using any of the input arguments in the previous syntaxes.

Examples

collapse all

Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using all observations.

Mdl = fitrtree(X,MPG);

Compute the resubstitution MSE.

Yfit = resubPredict(Mdl);
mean((Yfit - Mdl.Y).^2)
ans = 4.8952

You can get the same result using resubLoss.

resubLoss(Mdl)
ans = 4.8952

Load the carsmall data set. Consider Weight as a predictor of the response MPG.

load carsmall
idxNaN = isnan(MPG + Weight);
X = Weight(~idxNaN);
Y = MPG(~idxNaN);
n = numel(X);

Grow a regression tree using all observations.

Mdl = fitrtree(X,Y);

Compute resubstitution fitted values for the subtrees at several pruning levels.

m = max(Mdl.PruneList);
pruneLevels = 1:4:m; % Pruning levels to consider
z = numel(pruneLevels);
Yfit = resubPredict(Mdl,Subtrees=pruneLevels);

Yfit is an n-by-z matrix of fitted values in which the rows correspond to observations and the columns correspond to a subtree.

Plot several columns of Yfit and Y against X.

sortDat = sortrows([X Y Yfit],1); % Sort all data with respect to X
plot(repmat(sortDat(:,1),1,size(Yfit,2)+1),sortDat(:,2:end)) % Vectorize for efficiency
lev = num2str((pruneLevels)',"Level %d MPG");
legend(["Observed MPG"; lev])
title("In-Sample Fitted Responses")
xlabel("Weight (lbs)")
ylabel("MPG")
h = findobj(gcf);
set(h(4:end),LineWidth=3) % Widen all lines

Figure contains an axes object. The axes object with title In-Sample Fitted Responses, xlabel Weight (lbs), ylabel MPG contains 5 objects of type line. These objects represent Observed MPG, Level 1 MPG, Level 5 MPG, Level 9 MPG, Level 13 MPG.

The values of Yfit for lower pruning levels tend to follow the data more closely than higher levels. Higher pruning levels tend to be flat for large X intervals.

Input Arguments

collapse all

Regression tree, specified as a RegressionTree object created using the fitrtree function.

Pruning level, specified as a vector of nonnegative integers in ascending order or "all".

If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (in other words, just the root node).

If you specify "all", then resubPredict operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList).

resubPredict prunes Mdl to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments.

To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting Prune="on", or by pruning tree using prune.

Data Types: single | double | char | string

Output Arguments

collapse all

Predicted resubstitution response values for the training data, returned as a vector or a matrix. Yfit is of the same data type as the training response data tree.Y.

If the Subtrees name-value argument is a numeric scalar, then Yfit is returned as a column vector. Otherwise, Yfit is returned as a matrix with m columns, where m is the number of subtrees. Each column represents the predictions of the corresponding subtree.

Node numbers of tree where each data row resolves, returned as a numeric vector or a numeric matrix.

If the Subtrees name-value argument is a numeric scalar, then node is returned as an n-element column vector, where n is the number of rows of tree.X. Otherwise, node is returned as a matrix of size n-by-m, where m is the number of subtrees. Each column represents the node predictions of the corresponding subtree.

Extended Capabilities

Version History

Introduced in R2011a