Calculate the test set mean squared error (MSE) of a direct forecasting model.
Load the sample file TemperatureData.csv
, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.
Year Month Day TemperatureF
____ ___________ ___ ____________
2015 {'January'} 1 23
2015 {'January'} 2 31
2015 {'January'} 3 25
2015 {'January'} 4 39
2015 {'January'} 5 29
2015 {'January'} 6 12
2015 {'January'} 7 10
2015 {'January'} 8 4
For this example, use a subset of the temperature data that omits the first 100 observations.
Create a datetime
variable t
that contains the year, month, and day information for each observation in Tbl
. Then, use t
to convert Tbl
into a timetable.
Plot the temperature values in Tbl
over time.
Partition the temperature data into training and test sets by using tspartition
. Reserve 20% of the observations for testing.
Create a full direct forecasting model by using the data in trainingTbl
. Train the model using a decision tree learner. All three of the predictors (Year
, Month
, and Day
) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags.
Mdl =
DirectForecaster
Horizon: 1
ResponseLags: [1 2 3 4 5 6 7]
LeadingPredictors: [1 2 3]
LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]}
ResponseName: 'TemperatureF'
PredictorNames: {'Year' 'Month' 'Day'}
CategoricalPredictors: 2
Learners: {[1x1 classreg.learning.regr.CompactRegressionTree]}
MaxLag: 7
NumObservations: 372
Mdl
is a DirectForecaster
model object. By default, the horizon is one step ahead. That is, Mdl
predicts a value that is one step into the future.
Calculate the test set MSE. Smaller MSE values indicate better performance.