Question on Regression Learner App
34 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hi guys,
I trained a model using the regression leaner package in matlab R2021b. When training my model, I got some "Training results" (such as RMSE, R-squared, etc.) and a scatter plot with Predicted Response vs True Response. Then, I saved the model as "Export Model for Deployment". Now, I'd like to obtain the predicted response values from this training instance that originated such Predicted Results. Do you know how I can get them?
thanks a lot for your answer!
best,
Laura
0 Kommentare
Akzeptierte Antwort
Drew
am 5 Jan. 2023
To get the RMSE results on validation data, a set of k-fold cross-validation models are needed. In the example provided, 50-fold cross-validation was used in Regression Learner. When running this model training in Regression Learning, 51 models were trained: 1 model for each cross-validation fold, plus a final model trained on all of the training data. When a model is exported from Regression Learner in 2021b, only the final model is exported. This is highlighted in a note at the top of this page: https://www.mathworks.com/help/stats/export-regression-model-to-predict-new-data.html
At the high-level, two approaches are:
(1) Use the "Export Model" option from the Regression Learner, then write code to calculate the validation RMSE
(2) Use the "Generate Function" option of Regression Learner. This generates a matlab function which trains the final model and calculates the validation RMSE.
(1) Use the "Export Model" option from the Regression Learner, then write code to calculate the validation RMSE
For approach (1): After exporting the final model from the Regression Learner app as "trainedModel", one can get the validation RMSE with the code shown below.
% Do 50-fold cross-validation.
CVMdl= crossval(trainedModel.RegressionGP,'Kfold',50);
% Do prediction on the validation data using the set of 50 cross-validation models
Y_validation=kfoldPredict(CVMdl);
% Calculate RMSE on validation data
rmse_on_validation_data=sqrt(mean((Y_validation-tbl_training.Y).^2));
Note that the "crossval" function will do 50-fold cross-validation, since we specified 'Kfold' of 50. This means that 50 models will be trained, and stored in the resulting data structure. The "crossval" function will randomly partition the training data into 50 parts, then train 50 models, one for each fold. For example, the first model could be trained on folds 2-50, so it can be tested on fold 1. The second model could be trained on folds 1 and 3-50, so it can be tested on fold 2, etc. The crossval function accesses the original training data from inside the trainedModel.RegressionGP data structure. For more info, see https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedmodel-class.html
Here is some code to plot the validation predictions versus the True response:
% Plot predicted vs actual for validation data
scatter(tbl_training.Y,Y_validation,15,'filled','Color',[0 0.4470 0.7410]);
line([-1.75,1.75],[-1.75,1.75],'Color','k');
axis([-1.75 1.75 -1.75 1.75]);
xlabel('True response');ylabel('Predicted response using kfold validation models');
title('On validation data, Predicted response vs True Response');
subtitle(sprintf('RMSE of kfold validation models on validation data:%0.5f',rmse_on_validation_data));
legend("Observations","Perfect prediction","Location","southeast");
This leads to the following figure:
So, the above plot is what you are looking for. A few notes:
(1) The RMSE on validation data (0.29623) is slightly different from what you see in Regression Learner (0.29645) because the data was randomly re-partitioned into 50-folds at the command line with the function crossval, and thus the 50 cross-validation models are slightly different than what was used inside Regression Learner.
(2) The RMSE on the training data is much lower (0.18386), because testing the final model on the training data is "cheating" because the model training has seen the data being predicted. That is, in this case, the same data is being used for training and testing. A similar calculation and plot can be done using the final model on the training data:
% Do prediction on the training set, using the final model
Y_training = trainedModel.predictFcn(tbl_training);
% Calculate RMSE on training set using final model
rmse_on_training_data = sqrt(mean((Y_training-tbl_training.Y).^2))
% Plot predicted vs actual for training data using final model
scatter(tbl_training.Y,Y_training,15,'filled','Color',[0 0.4470 0.7410]);
line([-1.75,1.75],[-1.75,1.75],'Color','k');
axis([-1.75 1.75 -1.75 1.75]);
xlabel('True response');ylabel('Predicted response using final model');
title('On training data, Predicted response vs True Response');
subtitle(sprintf('RMSE of final model on training data:%0.5f',rmse_on_training_data));
legend("Observations","Perfect prediction","Location","southeast");
This leads to the following plot for the training data:
(2) Use the "Generate Function" option of Regression Learner. This generates a MATLAB function which trains the final model and calculates the validation RMSE.
Another way to reproduce the validation RMSE result is to use the "Generate Function" option from the Regression Learner app. The data tip indicates that this option will "Generate MATLAB code for training the currently selected model in the Models pane, including validation predictions."
So, just select the "Generate Function" option in the export area:
This outputs the following code. Notice that the last 3 lines of code calculate the validationRMSE in a way similar to that provided in the first part of this answer. For more info, see https://www.mathworks.com/help/stats/export-regression-model-to-predict-new-data.html#bvi2d8a-49. (Note, if you use PCA or feature selection in the Regression Learner app, then the generated code for calculating the validation RMSE will be much longer, and so in that case it is especially helpful to have this code auto-generated by the Regression Learner app.)
function [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% Returns a trained regression model and its RMSE. This code recreates the
% model trained in Regression Learner app. Use the generated code to
% automate training the same model with new data, or to learn how to
% programmatically train models.
%
% Input:
% trainingData: A table containing the same predictor and response
% columns as those imported into the app.
%
% Output:
% trainedModel: A struct containing the trained regression model. The
% struct contains various fields with information about the trained
% model.
%
% trainedModel.predictFcn: A function to make predictions on new data.
%
% validationRMSE: A double containing the RMSE. In the app, the Models
% pane displays the RMSE for each model.
%
% Use the code to train the model with new data. To retrain your model,
% call the function from the command line with your original data or new
% data as the input argument trainingData.
%
% For example, to retrain a regression model trained with the original data
% set T, enter:
% [trainedModel, validationRMSE] = trainRegressionModel(T)
%
% To make predictions with the returned 'trainedModel' on new data T2, use
% yfit = trainedModel.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
% trainedModel.HowToPredict
% Auto-generated by MATLAB on 04-Jan-2023 17:51:50
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'X1', 'X2', 'X3', 'X4', 'X5', 'X6'};
predictors = inputTable(:, predictorNames);
response = inputTable.Y;
isCategoricalPredictor = [false, false, false, false, false, false];
% Train a regression model
% This code specifies all the model options and trains the model.
regressionGP = fitrgp(...
predictors, ...
response, ...
'BasisFunction', 'constant', ...
'KernelFunction', 'exponential', ...
'Standardize', true);
% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
gpPredictFcn = @(x) predict(regressionGP, x);
trainedModel.predictFcn = @(x) gpPredictFcn(predictorExtractionFcn(x));
% Add additional fields to the result struct
trainedModel.RequiredVariables = {'X1', 'X2', 'X3', 'X4', 'X5', 'X6'};
trainedModel.RegressionGP = regressionGP;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2021b.';
trainedModel.HowToPredict = sprintf('To make predictions on a new table, T, use: \n yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'X1', 'X2', 'X3', 'X4', 'X5', 'X6'};
predictors = inputTable(:, predictorNames);
response = inputTable.Y;
isCategoricalPredictor = [false, false, false, false, false, false];
% Perform cross-validation
partitionedModel = crossval(trainedModel.RegressionGP, 'KFold', 50);
% Compute validation predictions
validationPredictions = kfoldPredict(partitionedModel);
% Compute validation RMSE
validationRMSE = sqrt(kfoldLoss(partitionedModel, 'LossFun', 'mse'));
3 Kommentare
Drew
am 9 Jan. 2023
If my latest answer has been helpful to you, it would be great if you can accept the answer. I thought I would mention this, since it looks like you are new to MATLAB answers.
Your latest comment asks about wanting to "show statistics on model calibration". If this means you want to show statistics about the predicted versus actual response of your Gaussian Process regression model, then the answer I have given enables you to do exactly that on validation data, training data, or new test data. .
So, to recap, after training a Regression model, here are some common actions that are done:
(1) Use the final model to get regression (prediction) results on new data. This data could be thought of as "new test data". https://www.mathworks.com/help/stats/export-regression-model-to-predict-new-data.html
(2) Use k-fold cross-validation models to get regression (prediction) results on the validation data, and calculate the RMSE (or other metric) on the validation data. If you want to estimate the expected RMSE on future new test data, then the RMSE on the validation data can be used for that purpose.
(3) Use the final model to get regression (prediction) results on the training data, and calculate the RMSE (or other metric) on the training data. Note that this RMSE on the training data, using the final model, is not a good predictor of RMSE on future new test data, because the RMSE on training data will be lower due to the ovelap in training and test data.
I think the above three options are probably all you need.
Less common actions: In your comment posted Jan 4, 2023 14:23, you used the fitlm command to train a linear regression model on the (x,y) datapoints formed from the (true, predicted) values from your Gaussian Process regression model. That creates a best-fit line through the "predicted vs true" datapoints. That is another regression, done after the Gaussian Process regression, and so it gives different results, with a different error rate, that is, a different RMSE, etc, since it is a completely different regression. Based on your questions and comments, it looked like you wanted to reproduce what you were seeing in the Regression Learner app, so I indicated how to do that. The Regression Learner app does not train a linear regression model on the "predicted vs true" datapoints from the Gaussian Process Regression model. If you wanted to look at that second regression, built with fitlm, here is how it looks:
% Do prediction on the training set, using the final model
Y_training = trainedModel.predictFcn(tbl_training);
% Calculate RMSE on training set using final model
rmse_on_training_data = sqrt(mean((Y_training-tbl_training.Y).^2))
% Plot predicted vs actual for training data using final model
scatter(tbl_training.Y,Y_training,15,'filled','Color',[0 0.4470 0.7410]);
line([-1.75,1.75],[-1.75,1.75],'Color','k');
axis([-1.75 1.75 -1.75 1.75]);
xlabel('True response');ylabel('Predicted response using final GPR model');
title('On training data, Predicted response vs True Response');
subtitle(sprintf('RMSE of final GPR model on training data:%0.5f\nRed line shows best linear fit through (true, predicted) datapoints',rmse_on_training_data));
hold on;
% Do another regression, which is drawing a best-fit line through (x,y) points (actual, predicted) from GPR
Mdl=fitlm(tbl_training.Y,Y_training)
x=linspace(-2,2,100);
y=Mdl.Coefficients{1,1}+x*Mdl.Coefficients{2,1}
plot(x,y,'r');
legend("Observations","Perfect GPR prediction if blue points were on this line","Best fit line through (true, predicted) points from GPR","Location","southeast");
legend('Position',[0.30476,0.13651,0.61786,0.11905])
hold off;
That yields this plot:
The simple linear model also has a built-in plot method that yields a similar plot (but without the black diagonal line), but with much less code. We will set the axis limits to be the same, so the plots look more similar:
plot(Mdl)
axis([-1.75 1.75 -1.75 1.75]);
Weitere Antworten (1)
Drew
am 3 Jan. 2023
To work with a model from the Regression Learner app at the MATLAB commandline, it is recommended to use the "Export model" or "Export Compact Model" options, rather than "Export Model for Deployment". For example, if your model was exported to the MATLAB workspace as a trainedModel using the "Export Model" or "Export Compact Model" option, and if X contains the training data, then perform regression on the training data with:
trainedModel.predictFcn(X)
If "Export Model" is selected rather than "Export Compact Model", then the training data is inside the model object in the trainedModel structure. You can see the model object type by examining the tranedModel structure. For example, if the trained model is a RegressionTree, then perform regression on the training data with:
trainedModel.predictFcn(trainedModel.RegressionTree.X)
2 Kommentare
Siehe auch
Kategorien
Mehr zu Regression Learner App finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!