# compare

Class: LinearMixedModel

Compare linear mixed-effects models

## Syntax

• `results = compare(lme,altlme)`
• `results = compare(___,Name,Value)` example
• ```[results,siminfo] = compare(lme,altlme,'NSim',nsim)``` example
• ```[results,siminfo] = compare(___,Name,Value)``` example

## Description

````results = compare(lme,altlme)` returns the results of a likelihood ratio test that compares the linear mixed-effects models `lme` and `altlme`. Both models must use the same response vector in the fit and `lme` must be nested in `altlme` for a valid theoretical likelihood ratio test. Always input the smaller model first, and the larger model second.`compare` tests the following null and alternate hypotheses:H0: Observed response vector is generated by `lme`.H1: Observed response vector is generated by model `altlme`.It is recommended that you fit `lme` and `altlme` using the maximum likelihood (ML) method prior to model comparison. If you use the restricted maximum likelihood (REML) method, then both models must have the same fixed-effects design matrix.To test for fixed effects, use `compare` with the simulated likelihood ratio test when `lme` and `altlme` are fit using ML or use the `fixedEffects`, `anova`, or `coefTest` methods.```

example

````results = compare(___,Name,Value)` also returns the results of a likelihood ratio test that compares linear mixed-effects models `lme` and `altlme` with additional options specified by one or more `Name,Value` pair arguments. For example, you can check if the first input model is nested in the second input model.```

example

``````[results,siminfo] = compare(lme,altlme,'NSim',nsim)``` returns the results of a simulated likelihood ratio test that compares linear mixed-effects models `lme` and `altlme`.You can fit `lme` and `altlme` using ML or REML. Also, `lme` does not have to be nested in `altlme`. If you use the restricted maximum likelihood (REML) method to fit the models, then both models must have the same fixed-effects design matrix.```

example

``````[results,siminfo] = compare(___,Name,Value)``` also returns the results of a simulated likelihood ratio test that compares linear mixed-effects models `lme` and `altlme` with additional options specified by one or more `Name,Value` pair arguments.For example, you can change the options for performing the simulated likelihood ratio test, or change the confidence level of the confidence interval for the p-value.```

## Input Arguments

collapse all

### `lme` — Linear mixed-effects model`LinearMixedModel` object

Linear mixed-effects model, returned as a `LinearMixedModel` object.

For properties and methods of this object, see `LinearMixedModel`.

### `altlme` — Alternative linear mixed-effects model`LinearMixedModel` object

Alternative linear mixed-effects model fit to the same response vector but with different model specifications, specified as a `LinearMixedModel` object. `lme` must be nested in `altlme`, that is, `lme` should be obtained from `altlme` by setting some parameters to fixed values, such as 0. You can create a linear mixed-effects object using `fitlme` or `fitlmematrix`.

### `nsim` — Number of replications for simulationspositive integer number

Number of replications for simulations in the simulated likelihood ratio test, specified as a positive integer number. You must specify `nsim` to do a simulated likelihood ratio test.

Example: `'NSim',1000`

Data Types: `double` | `single`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

### `'Alpha'` — Confidence level0.05 (default) | scalar value in the range 0 to 1

Confidence level, specified as the comma-separated pair consisting of `'Alpha'` and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%.

For example, for 99% confidence intervals, you can specify the confidence level as follows.

Example: `'Alpha',0.01`

Data Types: `single` | `double`

### `'Options'` — Options for performing simulated likelihood ratio teststructure

Options for performing the simulated likelihood ratio test, specified as the comma-separated pair consisting of `'Options'`, and a structure created by `statset('LinearMixedModel')`.

`compare` uses the following fields.

 `'UseParallel'` `False` for serial computation. Default.`True` for parallel computation. `'UseSubstreams'` `False` for not using a separate substream of the random number generator for each iteration. Default.`True` for using a separate substream of the random number generator for each iteration. You can only use this option with random stream types that support substreams. `'Streams'` If `'UseSubstreams'` is `True`, then `'Streams'` must be a single random number stream, or a scalar cell array containing a single stream.If `'UseSubstreams'` is `False` and `'UseParallel'` is `False`, then `'Streams'` must be a single random number stream, or a scalar cell array containing a single stream.`'UseParallel'` is `True`, then `'Streams'` must be equal to the number of processors used. If a parallel pool is open, then the `'Streams'` is the same length as the size of the parallel pool. If `'UseParallel'` is `True`, a parallel pool might open up for you. But since `'Streams'` must be equal to the number of processors used, it is best to open a pool explicitly using the `parpool` command, before calling `compare` with the`'UseParallel','True'` option.

For information on parallel statistical computing at the command line, enter

`help parallelstats`

Data Types: `struct`

### `'CheckNesting'` — Indicator to check nesting between two models`false` (default) | `true`

Indicator to check nesting between two models, specified as the comma-separated pair consisting of `'CheckNesting'` and one of the following.

 `false` Default. No checks. `true` `compare` checks if the smaller model `lme` is nested in the bigger model `altlme`.

`lme` must be nested in the alternate model `altlme` for a valid theoretical likelihood ratio test. `compare` returns an error message if the nesting requirements are not satisfied.

Although valid for both tests, the nesting requirements are weaker for the simulated likelihood ratio test.

Example: `'CheckNesting',true`

Data Types: `single` | `double`

## Output Arguments

collapse all

### `results` — Results of likelihood ratio test or simulated likelihood ratio testdataset array

Results of the likelihood ratio test or simulated likelihood ratio test, returned as a dataset array with two rows. The first row is for `lme`, and the second row is for `altlme`. The columns of `results` depend on whether the test is a likelihood ratio or a simulated likelihood ratio test.

• If you use the likelihood ratio test, then `results` contains the following columns.

 `Model` Name of the model `DF` Degrees of freedom, that is, the number of free parameters in the model `AIC` Akaike information criterion for the model `BIC` Bayesian information criterion for the model `LogLik` Maximized log likelihood for the model `LRStat` Likelihood ratio test statistic for comparing `altlme` versus `lme` `deltaDF` `DF` for `altlme` minus `DF` for `lme` `pValue` p-value for the likelihood ratio test

• If you use the simulated likelihood ratio test, then `results` contains the following columns.

 `Model` Name of the model `DF` Degrees of freedom, that is, the number of free parameters in the model `LogLik` Maximized log likelihood for the model `LRStat` Likelihood ratio test statistic for comparing `altlme` versus `lme` `pValue` p-value for the likelihood ratio test `Lower` Lower limit of the confidence interval for `pValue` `Upper` Upper limit of the confidence interval for `pValue`

### `siminfo` — Simulation outputstructure

Simulation output, returned as a structure with the following fields.

 `nsim` Value set for `nsim`. `alpha` Value set for `'Alpha'`. `pValueSim` Simulation-based p-value. `pValueSimCI` Confidence interval for `pValueSim`. The first element of the vector is the lower limit and the second element of the vector contains the upper limit. `deltaDF` The number of free parameters in `altlme` minus the number of free parameters in `lme`. `DF` for `altlme` minus `DF` for `lme`. `THO` A vector of simulated likelihood ratio test statistics under the null hypothesis that the model `lme` generated the observed response vector `y`.

## Examples

collapse all

### Test for Random Effects

`load flu`

The `flu` dataset array has a `Date` variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the CDC).

To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into a tall array. The new dataset array, `flu2`, must have the response variable, `FluRate`, the nominal variable, `Region`, that shows which region each estimate is from, and the grouping variable `Date`.

```flu2 = stack(flu,2:10,'NewDataVarName','FluRate',... 'IndVarName','Region'); flu2.Date = nominal(flu2.Date);```

Fit a linear mixed-effects model, with a varying intercept and varying slope for each region, grouped by `Date`.

`altlme = fitlme(flu2,'FluRate ~ 1 + Region + (1 + Region|Date)');`

Fit a linear mixed-effects model with fixed effects for the region and a random intercept that varies by `Date`.

`lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)');`

Compare the two models. Also check if `lme2` is nested in `lme`.

`compare(lme,altlme,'CheckNesting',true)`
```ans = Theoretical Likelihood Ratio Test Model DF AIC BIC LogLik LRStat deltaDF pValue lme 11 318.71 364.35 -148.36 altlme 55 -305.51 -77.346 207.76 712.22 44 0 ```

The small p-value of 0 indicates that model `altlme` is significantly better than the simpler model `lme`.

### Test for Fixed and Random Effects

Navigate to a folder containing sample data.

```cd(matlabroot) cd('help/toolbox/stats/examples')```

`load fertilizer`

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five different types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data.

Store the data in a dataset array called `ds`, for practical purposes, and define `Tomato`, `Soil`, and `Fertilizer` as categorical variables.

```ds = fertilizer; ds.Tomato = nominal(ds.Tomato); ds.Soil = nominal(ds.Soil); ds.Fertilizer = nominal(ds.Fertilizer);```

Fit a linear mixed-effects model, where `Fertilizer` and `Tomato` are the fixed-effects variables, and the mean yield varies by the block (soil type) and the plots within blocks (tomato types within soil types) independently.

`lmeBig = fitlme(ds,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)');`

Refit the model after removing the interaction term `Tomato:Fertilizer` and the random-effects term `(1 | Soil)`.

`lmeSmall = fitlme(ds,'Yield ~ Fertilizer + Tomato + (1|Soil:Tomato)');`

Compare the two models using the simulated likelihood ratio test with 1000 replications. You must use this test to test for both fixed- and random-effect terms. Note that both models are fit using the default fitting method, ML. That's why, there is no restriction on the fixed-effects design matrices. If you use restricted maximum likelihood (REML) method, both models must have identical fixed-effects design matrices.

`[table,siminfo] = compare(lmeSmall,lmeBig,'nsim',1000)`
```table = Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05 Model DF AIC BIC LogLik LRStat pValue Lower Upper lme2 10 511.06 532 -245.53 lme 23 522.57 570.74 -238.29 14.491 0.54845 0.51702 0.5796 siminfo = nsim: 1000 alpha: 0.0500 pvalueSim: 0.5485 pvalueSimCI: [0.5170 0.5796] deltaDF: 13 TH0: [1000x1 double]```

The high p-value 0.5485 suggests that the larger model, `lme` is not significantly better than the smaller model, `lme2`. The smaller values of AIC and BIC for `lme2` also support this.

### Models with Correlated and Uncorrelated Random Effects

`load carbig`

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration, horsepower, and the cylinders, and potentially correlated random effects for intercept and acceleration grouped by model year.

First, prepare the design matrices.

```X = [ones(406,1) Acceleration Horsepower]; Z = [ones(406,1) Acceleration]; Model_Year = nominal(Model_Year); G = Model_Year;```

Now, fit the model using `fitlmematrix` with the defined design matrices and grouping variables.

```lme = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',.... {'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',... {{'Intercept','Acceleration'}},'RandomEffectGroups',{'Model_Year'});```

Refit the model with uncorrelated random effects for intercept and acceleration. First prepare the random effects design and the random effects grouping variables.

```Z = {ones(406,1),Acceleration}; G = {Model_Year,Model_Year}; lme2 = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',.... {'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',... {{'Intercept'},{'Acceleration'}},'RandomEffectGroups',... {'Model_Year','Model_Year'});```

Compare `lme` and `lme2` using the simulated likelihood ratio test.

```compare(lme2,lme,'CheckNesting',true,'NSim',1000) ```
```ans = Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05 Model DF AIC BIC LogLik LRStat pValue Lower Upper lme2 6 2194.5 2218.3 -1091.3 lme 7 2193.5 2221.3 -1089.7 3.0323 0.095904 0.078373 0.11585```

The high p-value of 0.095904 indicates that `lme2` is not a significantly better fit than `lme`.

### Simulated Likelihood Ratio Test Using Parallel Computing

Navigate to a folder containing sample data.

```cd(matlabroot) cd('help/toolbox/stats/examples')```

`load fertilizer`

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five different types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data.

Store the data in a dataset array called `ds`, for practical purposes, and define `Tomato`, `Soil`, and `Fertilizer` as categorical variables.

```ds = fertilizer; ds.Tomato = nominal(ds.Tomato); ds.Soil = nominal(ds.Soil); ds.Fertilizer = nominal(ds.Fertilizer);```

Fit a linear mixed-effects model, where `Fertilizer` and `Tomato` are the fixed-effects variables, and the mean yield varies by the block (soil type), and the plots within blocks (tomato types within soil types) independently.

`lme = fitlme(ds,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)');`

Refit the model after removing the interaction term `Tomato:Fertilizer` and the random-effects term `(1|Soil)`.

`lme2 = fitlme(ds,'Yield ~ Fertilizer + Tomato + (1|Soil:Tomato)');`

Create the options structure for `LinearMixedModel`.

`opt = statset('LinearMixedModel')`
```opt = Display: 'off' MaxFunEvals: [] MaxIter: 10000 TolBnd: [] TolFun: 1.0000e-06 TolTypeFun: [] TolX: 1.0000e-12 TolTypeX: [] GradObj: [] Jacobian: [] DerivStep: [] FunValCheck: [] Robust: [] RobustWgtFun: [] WgtFun: [] Tune: [] UseParallel: [] UseSubstreams: [] Streams: {} OutputFcn: []```

Change the options for parallel testing.

`opt.UseParallel = true;`

Start a parallel environment.

`mypool = parpool();`
```Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: {0x1 cell} NumWorkers: 2 Cluster: [1x1 parallel.cluster.Local] SpmdEnabled: 1```

Compare `lme2` and `lme` using the simulated likelihood ratio test with 1000 replications and parallel computing.

`compare(lme2,lme,'nsim',1000,'Options',opt)`
```ans = Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05 Model DF AIC BIC LogLik LRStat pValue Lower Upper lme2 10 511.06 532 -245.53 lme 23 522.57 570.74 -238.29 14.491 0.54845 0.51702 0.5796```

The high p-value, 0.5485 suggests that the larger model, `lme` is not significantly better than the smaller model, `lme2`. The smaller values of `AIC` and `BIC` for `lme2` also support this.

## Definitions

### Likelihood Ratio Test

Under the null hypothesis H0, the observed likelihood ratio test statistic has an approximate chi-squared reference distribution with degrees of freedom `deltaDF`. When comparing two models, `compare` computes the p-value for the likelihood ratio test by comparing the observed likelihood ratio test statistic with this chi-squared reference distribution.

The p-values obtained using the likelihood ratio test can be conservative when testing for the presence or absence of random-effects terms and anticonservative when testing for the presence or absence of fixed-effects terms. Hence, use the `fixedEffects`, `anova`, or `coefTest` method or the simulated likelihood ratio test while testing for fixed effects.

### Simulated Likelihood Ratio Test

To perform the simulated likelihood ratio test, `compare` first generates the reference distribution of the likelihood ratio test statistic under the null hypothesis. Then, it assesses the statistical significance of the alternate model by comparing the observed likelihood ratio test statistic to this reference distribution.

`compare` produces the simulated reference distribution of the likelihood ratio test statistic under the null hypothesis as follows:

• Generate random data `ysim` from the fitted model `lme`.

• Fit the model specified in `lme` and alternate model `altlme` to the simulated data `ysim`.

• Calculate the likelihood ratio test statistic using results from step 2 and store the value.

• Repeat step 1 to 3 `nsim` times.

Then, `compare` computes the p-value for the simulated likelihood ratio test by comparing the observed likelihood ratio test statistic with the simulated reference distribution. The p-value estimate is the ratio of the number of times the simulated likelihood ratio test statistic is equal to or exceeds the observed value plus one, to the number of replications plus one.

Suppose the observed likelihood ratio statistic is T, and the simulated reference distribution is stored in vector TH0. Then,

$p-value=\frac{\left[\sum _{j=1}^{nsim}I\left({T}_{{H}_{0}}\left(j\right)\ge T\right)\right]+1}{nsim+1}.$

To account for the uncertainty in the simulated reference distribution, `compare` computes a 100*(1 – α)% confidence interval for the true p-value.

You can use the simulated likelihood ratio test to compare arbitrary linear mixed-effects models. That is, when you are using the simulated likelihood ratio test, `lme` does not have to be nested within `altlme`, and you can fit `lme` and `altlme` using either maximum likelihood (ML) or restricted maximum likelihood (REML) methods. If you use the restricted maximum likelihood (REML) method to fit the models, then both models must have the same fixed-effects design matrix.

### Nesting Requirements

The `'CheckNesting','True'` name-value pair argument checks the following requirements.

For a simulated likelihood ratio test:

• You must use the same method to fit both models (`lme` and `altlme`). `compare` cannot compare a model fit using ML to a model fit using REML.

• You must fit both models to the same response vector.

• If you use REML to fit `lme` and `altlme`, then both models must have the same fixed-effects design matrix.

• The maximized log likelihood or restricted log likelihood of the bigger model (`altlme`) must be greater than or equal to that of the smaller model (`lme`).

For a theoretical test, `'CheckNesting','True'` checks all the requirements listed for a simulated likelihood ratio test and the following:

• Weight vectors you use to fit `lme` and `altlme` must be identical.

• If you use ML to fit `lme` and `altlme`, the fixed-effects design matrix of the bigger model (`altlme`) must contain that of the smaller model (`lme`).

• The random-effects design matrix of the bigger model (`altlme`) must contain that of the smaller model (`lme`).

### Akaike and Bayesian Information Criteria

Akaike information criterion (AIC) is AIC = –2*logLM + 2*(nc + p + 1), where logLM is the maximized log likelihood (or maximized restricted log likelihood) of the model, and nc + p + 1 is the number of parameters estimated in the model. p is the number of fixed-effects coefficients, and nc is the total number of parameters in the random-effects covariance excluding the residual variance.

Bayesian information criterion (BIC) is BIC = –2*logLM + ln(neff)*(nc + p + 1), where logLM is the maximized log likelihood (or maximized restricted log likelihood) of the model, neff is the effective number of observations, and (nc + p + 1) is the number of parameters estimated in the model.

• If the fitting method is maximum likelihood (ML), then neff = n, where n is the number of observations.

• If the fitting method is restricted maximum likelihood (REML), then neff = np.

A lower value of deviance indicates a better fit. As the value of deviance decreases, both AIC and BIC tend to decrease. Both AIC and BIC also include penalty terms based on the number of parameters estimated, p. So, when the number of parameters increase, the values of AIC and BIC tend to increase as well. When comparing different models, the model with the lowest AIC or BIC value is considered as the best fitting model.

### Deviance

`LinearMixedModel` computes the deviance of model M as minus two times the loglikelihood of that model. Let LM denote the maximum value of the likelihood function for model M. Then, the deviance of model M is

$-2*\mathrm{log}{L}_{M}.$

A lower value of deviance indicates a better fit. Suppose M1 and M2 are two different models, where M1 is nested in M2. Then, the fit of the models can be assessed by comparing the deviances Dev1 and Dev2 of these models. The difference of the deviances is

$Dev=De{v}_{1}-De{v}_{2}=2\left(\mathrm{log}L{M}_{2}-\mathrm{log}L{M}_{1}\right).$

Usually, the asymptotic distribution of this difference has a chi-square distribution with degrees of freedom v equal to the number of parameters that are estimated in one model but fixed (typically at 0) in the other. That is, it is equal to the difference in the number of parameters estimated in M1 and M2. You can get the p-value for this test using `1 – chi2cdf(Dev,V)`, where Dev = Dev2Dev1.

However, in mixed-effects models, when some variance components fall on the boundary of the parameter space, the asymptotic distribution of this difference is more complicated. For example, consider the hypotheses

H0: $D=\left(\begin{array}{cc}{D}_{11}& 0\\ 0& 0\end{array}\right),$ D is a q-by-q symmetric positive semidefinite matrix.

H1: D is a (q+1)-by-(q+1) symmetric positive semidefinite matrix.

That is, H1 states that the last row and column of D are different from zero. Here, the bigger model M2 has q + 1 parameters and the smaller model M1 has q parameters. And Dev has a 50:50 mixture of χ2q and χ2(q + 1) distributions (Stram and Lee, 1994).

## References

[1] Hox, J. Multilevel Analysis, Techniques and Applications. Lawrence Erlbaum Associates, Inc., 2002.

[2] Stram D. O. and J. W. Lee. "Variance components testing in the longitudinal mixed-effects model". Biometrics, Vol. 50, 4, 1994, pp. 1171–1177.