Estimating Parameters in Linear Mixed-Effects Models

A linear mixed-effects model is of the form

$y = \underset{f i x e d}{\underset{︸}{X β}} + \underset{r a n d o m}{\underset{︸}{Z b}} + \underset{e r r o r}{\underset{︸}{ε}},$

where

y is the n-by-1 response vector, and n is the number of observations.
X is an n-by-p fixed-effects design matrix.
β is a p-by-1 fixed-effects vector.
Z is an n-by-q random-effects design matrix.
b is a q-by-1 random-effects vector.
ε is the n-by-1 observation error vector.

The random-effects vector, b, and the error vector, ε, are assumed to have the following independent prior distributions:

$\begin{array}{l} b ~ N (0, σ^{2} D (θ)), \\ ε ~ N (0, σ {}^{2}I), \end{array}$

where D is a symmetric and positive semidefinite matrix, parameterized by a variance component vector θ, I is an n-by-n identity matrix, and σ² is the error variance.

In this model, the parameters to estimate are the fixed-effects coefficients β, and the variance components θ and σ². The two most commonly used approaches to parameter estimation in linear mixed-effects models are maximum likelihood and restricted maximum likelihood methods.

Maximum Likelihood (ML)

The maximum likelihood estimation includes both regression coefficients and the variance components, that is, both fixed-effects and random-effects terms in the likelihood function.

For a linear mixed-effects model defined above, the conditional response of the response variable y given β, b, θ, and σ² is

$y | b, β, θ, σ^{2} ~ N (X β + Z b, σ^{2} I_{n}) .$

The likelihood of y given β, θ, and σ² is

$P (y | β, θ, σ^{2}) = \int P (y | b, β, θ, σ^{2}) P (b | θ, σ^{2}) d b,$

where

$\begin{array}{l} P (b | θ, σ^{2}) = \frac{1}{{(2 π σ^{2})}^{\frac{q}{2}}} \frac{1}{{| D (θ) |}^{\frac{1}{2}}} \exp {- \frac{1}{2 σ^{2}} b^{T} D^{- 1} b} and \\ P (y | b, β, θ, σ^{2}) = \frac{1}{{(2 π σ^{2})}^{\frac{n}{2}}} \exp {- \frac{1}{2 σ^{2}} {(y - X β - Z b)}^{T} (y - X β - Z b)} . \end{array}$

Suppose Λ(θ) is the lower triangular Cholesky factor of D(θ) and Δ(θ) is the inverse of Λ(θ). Then,

$D {(θ)}^{- 1} = Δ {(θ)}^{T} Δ (θ) .$

Define

$r^{2} (β, b, θ) = b^{T} Δ {(θ)}^{T} Δ (θ) b + {(y - X β - Z b)}^{T} (y - X β - Z b),$

and suppose b^* is the value of b that satisfies

${\frac{\partial r^{2} (β, b, θ)}{\partial b} |}_{b^{*}} = 0$

for given β and θ. Then, the likelihood function is

$P (y | β, θ, σ^{2}) = {(2 π σ^{2})}^{- \frac{n}{2}} {| D (θ) |}^{- \frac{1}{2}} \exp {- \frac{1}{2 σ^{2}} r^{2} (β, b^{*} (β), θ)} \frac{1}{{| Δ^{T} Δ + Z^{T} Z |}^{\frac{1}{2}}} .$

P(y|β,θ,σ²) is first maximized with respect to β and σ² for a given θ. Thus the optimized solutions $\hat{β} (θ)$ and ${\hat{σ}}^{2} (θ)$ are obtained as functions of θ. Substituting these solutions into the likelihood function produces $P (y | \hat{β} (θ), θ, {\hat{σ}}^{2} (θ))$ . This expression is called a profiled likelihood where β and σ² have been profiled out. $P (y | \hat{β} (θ), θ, {\hat{σ}}^{2} (θ))$ is a function of θ, and the algorithm then optimizes it with respect to θ. Once it finds the optimal estimate of θ, the estimates of β and σ² are given by $\hat{β} (θ)$ and ${\hat{σ}}^{2} (θ)$ .

The ML method treats β as fixed but unknown quantities when the variance components are estimated, but does not take into account the degrees of freedom lost by estimating the fixed effects. This causes ML estimates to be biased with smaller variances. However, one advantage of ML over REML is that it is possible to compare two models in terms of their fixed- and random-effects terms. On the other hand, if you use REML to estimate the parameters, you can only compare two models, that are nested in their random-effects terms, with the same fixed-effects design.

Restricted Maximum Likelihood (REML)

Restricted maximum likelihood estimation includes only the variance components, that is, the parameters that parameterize the random-effects terms in the linear mixed-effects model. β is estimated in a second step. Assuming a uniform improper prior distribution for β and integrating the likelihood P(y|β,θ,σ²) with respect to β results in the restricted likelihood P(y|θ,σ²). That is,

$P (y | θ, σ^{2}) = \int P (y | β, θ, σ^{2}) P (β) d β = \int P (y | β, θ, σ^{2}) d β .$

The algorithm first profiles out ${\hat{σ}}_{R}^{2}$ and maximizes remaining objective function with respect to θ to find ${\hat{θ}}_{R}$ . The restricted likelihood is then maximized with respect to σ² to find ${\hat{σ}}_{R}^{2}$ . Then, it estimates β by finding its expected value with respect to the posterior distribution

$P (β | y, {\hat{θ}}_{R}, {\hat{σ}}_{R}^{2}) .$

REML accounts for the degrees of freedom lost by estimating the fixed effects, and makes a less biased estimation of random effects variances. The estimates of θ and σ² are invariant to the value of β and less sensitive to outliers in the data compared to ML estimates. However, if you use REML to estimate the parameters, you can only compare two models that have the identical fixed-effects design matrices and are nested in their random-effects terms.

References

[1] Pinherio, J. C., and D. M. Bates. Mixed-Effects Models in S and S-PLUS. Statistics and Computing Series, Springer, 2004.

[2] Hariharan, S. and J. H. Rogers. “Estimation Procedures for Hierarchical Linear Models.” Multilevel Modeling of Educational Data (A. A. Connell and D. B. McCoach, eds.). Charlotte, NC: Information Age Publishing, Inc., 2008.

[3] Raudenbush, S. W. and A. S. Bryk. Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Thousand Oaks, CA: Sage Publications, 2002.

[4] Hox, J. Multilevel Analysis, Techniques and Applications. Lawrence Erlbaum Associates, Inc, 2002.

[5] Snidjers, T. and R. Bosker. Multilevel Analysis. Thousand Oaks, CA: Sage Publications, 1999.

[6] McCulloch, C.E., R. S. Shayle, and J. M. Neuhaus. Generalized, Linear, and Mixed Models. Wiley, 2008.