Main Content

Presample Data for Conditional Mean Model Estimation

Presample data comes from time points before the beginning of the observation period. In Econometrics Toolbox™, you can specify your own presample data or use generated presample data.

Time series plot showing presample period and observation period on the x axis.

In a conditional mean model, the distribution of εt is conditional on historical information. Historical information includes past responses, y1,y2,,yt1, past innovations, ε1,ε2,,εt1, and, if you include them in the model, past and present exogenous covariates, x1,x2,,xt1,xt.

The number of past responses and innovations that a current innovation depends on is determined by the degree of the AR or MA operators, and any differencing. For example, in an AR(2) model, each innovation depends on the two previous responses,

εt=ytcϕ1yt1ϕ2yt2.

In ARIMAX models, the current innovation also depends on the current value of the exogenous covariate (unlike distributed lag models). For example, in an ARX(2) model with one exogenous covariate, each innovation depends on the previous two responses and the current value of the covariate,

εt=ytcϕ1yt1ϕ2yt2+xt.

In general, the likelihood contribution of the first few innovations is conditional on historical information that might not be observable. How do you estimate the parameters without all the data? In the ARX(2) example, ε2 explicitly depends on y1, y0, and x2, and ε1 explicitly depends on y0, y1, and x1. Implicitly, ε2 depends on x1 and x0, and ε1 depends on x0 and x1. However, you cannot observe y0, y1, x0, and x1.

The amount of presample data that you need to initialize a model depends on the degree of the model. The property P of an arima model specifies the number of presample responses and exogenous data that you need to initialize the AR portion of a conditional mean model. For example, P = 2 in an ARX(2) model. Therefore, you need two responses and two data points from each exogenous covariate series to initialize the model.

One option is to use the first P data from the response and exogenous covariate series as your presample, and then fit your model to the remaining data. This results in some loss of sample size. If you plan to compare multiple potential models, be aware that you can only use likelihood-based measures of fit (including the likelihood ratio test and information criteria) to compare models fit to the same data (of the same sample size). If you specify your own presample data, then you must use the largest required number of presample responses across all models that you want to compare.

The property Q of an arima model specifies the number of presample innovations needed to initialize the MA portion of a conditional mean model. You can get presample innovations by dividing your data into two parts. Fit a model to the first part, and infer the innovations. Then, use the inferred innovations as presample innovations for estimating the second part of the data.

For a model with both an autoregressive and moving average component, you can specify both presample responses and innovations, one or the other, or neither.

By default, estimate generates automatic presample response and innovation data. The software:

  • Generates presample responses by backward forecasting.

  • Sets presample innovations to zero.

  • Does not generate presample exogenous data. One option is to backward forecast each exogenous series to generate a presample during data preprocessing.

See Also

Objects

Functions

Related Topics