Chapter 8 Linear Mixed Models

Example 8.1 (Dependent Samples on the Mean) Consider inference on a population’s mean. Supposdly, more observations imply more infotmation on the mean. This, however, is not the case if samples are completely dependant. More observations do not add any new information. From this example one may think that dependence is a bad thing. This is a false intuitiont: negative correlations imply oscilations about the mean, so they are actually more informative on the mean than independent observations.
Example 8.2 (Repeated Measures) Consider a prospective study, i.e., data that originates from selecting a set of subjects and making measurements on them over time. Also assume that some subjects received some treatment, and other did not. When we want to infer on the population from which these subjects have been sampled, we need to recall that some series of observations came from the same subject. If we were to ignore the subject of origin, and treat each observation as an independent sample point, we will think we have more information in our data than we actually do. For a rough intuition, think of a case where observatiosn within subject are perfectly dependent.

The sources of variability, i.e. noise, are known in the statistical literature as “random effects”. Specifying these sources determines the correlation structure in our measurements. In the simplest linear models of Chapter 6, we thought of the variability as a measurement error, independent of anything else. This, however, is rarely the case when time or space are involved.

The variability in our data is rarely the object of interest. It is merely the source of uncertainty in our measurements. The effects we want to infer on are assumingly non-random, thus known as “fixed-effects”. A model which has several sources of variability, i.e. random-effects, and several deterministic effects to study, i.e. fixed-effects, is known as a “mixed effects” model. If the model is also linear, it is known as a linear mixed model (LMM). Here are some examples of such models.

Example 8.3 (Fixed and Random Machine Effect) Consider the problem of testing for a change in the distribution of diamteters of manufactured bottle caps. We want to study the (fixed) effect of time: before versus after. Bottle caps are produced by several machines. Clearly there is variablity in the diameters within-machine and between-machines. Given many measurements on many bottle caps from many machines, we could standardize measurements by removing each machine’s average. This implies the within-machine variability is the only source of variability we care about, because the substration of the machine effect, removed information on the between-machine variability.
Alternatively, we could treat the between-machine variability as another source of noise/uncertainty when inferring on the temporal fixed effect.
Example 8.4 (Fixed and Random Subject Effect) Consider an experimenal design where each subject is given 2 types of diets, and his health condition is recorded. We could standardize over subjects by removing the subject-wise average, before comparing diets. This is what a paired t-test does. This also implies the within-subject variability is the only source of variability we care about. Alternatively, for inference on the population of “all subjects” we need to adress the between-subject variability, and not only the within-subject variability.

The unifying theme of the above examples, is that the variability in our data has several sources. Which are the sources of variability that need to concern us? This is a delicate matter which depends on your goals. As a rule of thumb, we will suggest the following view: If information of an effect will be available at the time of prediction, treat it as a fixed effect. If it is not, treat it as a random-effect.

LMMs are so fundamental, that they have earned many names:

  • Mixed Effects: Because we may have both fixed effects we want to estimate and remove, and random effects which contribute to the variability to infer against.

  • Variance Components: Because as the examples show, variance has more than a single source (like in the Linear Models of Chapter 6).

  • Hirarchial Models: Because as Example 8.4 demonstrates, we can think of the sampling as hierarchical– first sample a subject, and then sample its response.

  • Multilevel Analysis: For the same reasons it is also known as Hierarchical Models.

  • Repeated Measures: Because we make several measurements from each unit, like in Example 8.4.

  • Longitudinal Data: Because we follow units over time, like in Example 8.4.

  • Panel Data: Is the term typically used in econometric for such longitudinal data.

  • MANOVA: Many of the problems that may be solved with a multivariate analysis of variance (MANOVA), may be solved with an LMM for reasons we detail in 9.

  • Structured Prediction: In the machine learning literature, predicting outcomes with structure, such as correlated vectors, is known as Structured Learning. Because LMMs merely specify correlations, using a LMM for making predictions may be thought of as an instance of structured prediction.

Whether we are aiming to infer on a generative model’s parameters, or to make predictions, there is no “right” nor “wrong” approach. Instead, there is always some implied measure of error, and an algorithm may be good, or bad, with respect to this measure (think of false and true positives, for instance). This is why we care about dependencies in the data: ignoring the dependence structure will probably yield inefficient algorithms. Put differently, if we ignore the statistical dependence in the data we will probably me making more errors than possible/optimal.

We now emphasize:

  1. Like in previous chapters, by “model” we refer to the assumed generative distribution, i.e., the sampling distribution.

  2. LMMs are a way to infer against the right level of variability. Using a naive linear model (which assumes a single source of variability) instead of a mixed effects model, probably means your inference is overly anti-conservative. Put differently, the uncertainty in your estimates is higher than the linear model from Chapter 6 may suggest.

  3. In a LMM we will specify the dependence structure via the hierarchy in the sampling scheme (e.g. caps within machine, students within class, etc.). Not all dependency models can be specified in this way. Dependency structures that are not hierarchical include temporal dependencies (AR, ARIMA, ARCH and GARCH), spatial, Markov Chains, and more. To specify dependency structures that are no hierarchical, see Chapter 8 in (the excellent) Weiss (2005).

  4. If you are using the model merely for predictions, and not for inference on the fixed effects or variance components, then stating the generative distribution may be be useful, but not necessarily. See the Supervised Learning Chapter 10 for more on prediction problems. Also recall that machine learning from non-independent observations (such as LMMs) is a delicate matter that is rarely treated in the literature.

8.1 Problem Setup

\[\begin{align} y|x,u = x'\beta + z'u + \varepsilon \tag{8.1} \end{align}\]

where \(x\) are the factors with fixed effects, \(\beta\), which we may want to study. The factors \(z\), with effects \(u\), are the random effects which contribute to variability. In our repeated measures example (8.2) the treatment is a fixed effect, and the subject is a random effect. In our bottle-caps example (8.3) the time (before vs. after) is a fixed effect, and the machines may be either a fixed or a random effect (depending on the purpose of inference). In our diet example (8.4) the diet is the fixed effect and the family is a random effect.

Notice that we state \(y|x,z\) merely as a convenient way to do inference on \(y|x\), instead of directly specifying \(Var[y|x]\). This is exactly the power of LMMs: we specify the covariance not via the matrix \(Var[y,z]\), but rather via the sampling hierarchy.

Given a sample of \(n\) observations \((y_i,x_i,z_i)\) from model (8.1), we will want to estimate \((\beta,u)\). Under some assumption on the distribution of \(\varepsilon\) and \(z\), we can use maximum likelihood (ML). In the context of LMMs, however, ML is typically replaced with restricted maximum likelihood (ReML), because it returns unbiased estimates of \(Var[y|x]\) and ML does not.

8.1.1 Non-Linear Mixed Models

The idea of random-effects can also be implemented for non-linear mean models. Formally, this means that \(y|x,z=f(x,z,\varepsilon)\) for some non-linear \(f\). This is known as non-linead-mixed-models, which will not be discussed in this text.

8.1.2 Generalized Linear Mixed Models (GLMM)

You can marry the ideas of random effects, with non-linear link functions, and non-Gaussian distribution of the response. These are known as Generalized Linear Mixed Models. Wikidot has a nice comparison of several software suits for GLMMs. Also consider the mcglm R pacakge (Bonat 2018).

8.2 Mixed Models with R

We will fit mixed models with the lmer function from the lme4 package, written by the mixed-models Guru Douglas Bates. We start with a small simulation demonstrating the importance of acknowledging your sources of variability. Our demonstration consists of fitting a linear model that assumes independence, when data is clearly dependent.

# Simulation parameters
n.groups <- 4 # number of groups
n.repeats <- 2 # sample per group
groups <- rep(1:n.groups, each=n.repeats) %>% as.factor
n <- length(groups)
z0 <- rnorm(n.groups,0,10) # generate group effects
(z <- z0[as.numeric(groups)]) # generate and inspect random group effects
## [1]   8.901364   8.901364  -4.318889  -4.318889   9.708611   9.708611
## [7] -10.693773 -10.693773
epsilon <- rnorm(n,0,1) # generate measurement error

# Generate data
beta0 <- 2 # set global mean
y <- beta0 + z + epsilon # generate synthetic sample

We can now fit the linear and mixed models.

lm.5 <- lm(y~1)  # fit a linear model assuming independence
library(lme4)
## Loading required package: Matrix
lme.5 <- lmer(y~1|groups) # fit a mixed-model that deals with the group dependence

The summary of the linear model

summary.lm.5 <- summary(lm.5)
summary.lm.5
## 
## Call:
## lm(formula = y ~ 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.949  -7.275   1.629   8.668  10.005 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    3.317      3.500   0.948    0.375
## 
## Residual standard error: 9.898 on 7 degrees of freedom

The summary of the mixed-model

summary.lme.5 <- summary(lme.5)
summary.lme.5
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 | groups
## 
## REML criterion at convergence: 41
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.15395 -0.50048  0.04306  0.55891  0.99797 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  groups   (Intercept) 111.962  10.581  
##  Residual               2.012   1.418  
## Number of obs: 8, groups:  groups, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    3.317      5.314   0.624

Look at the standard error of the global mean, i.e., the intercept: for lm it is 3.4996374, and for lme it is 5.3143284. Why this difference? Because lm treats the group effect15 as a fixed while the mixed model treats the group effect as a source of noise/uncertainty. Clearly, inference using lm underestimates our uncertainty in the estimated population mean (\(\beta_0\)).

Now let’s adopt the paired t-test view, which removes the group mean, so that it implicitly ignores the between-group variability. Which is the model compatible with this view?

diffs <- tapply(y, groups, diff) 
diffs # Q:what is this estimating? A: epsilon+epsilon.
##         1         2         3         4 
## -1.411024 -1.598983 -1.493730  3.052394
sd(diffs) # 
## [1] 2.278119

So we see that a paired t-test infers only against the within-group variability. Q:Is this a good think? A: depends…

8.2.1 A Single Random Effect

We will use the Dyestuff data from the lme4 package, which encodes the yield, in grams, of a coloring solution (dyestuff), produced in 6 batches using 5 different preparations.

data(Dyestuff, package='lme4')
attach(Dyestuff)
head(Dyestuff)
##   Batch Yield
## 1     A  1545
## 2     A  1440
## 3     A  1440
## 4     A  1520
## 5     A  1580
## 6     B  1540

And visually

lattice::dotplot(Yield~Batch)

If we want to do inference on the (global) mean yield, we need to account for the two sources of variability: the within-batch variability, and the between-batch variability We thus fit a mixed model, with an intercept and random batch effect.

lme.1<- lmer( Yield ~ 1  | Batch  , Dyestuff )
summary(lme.1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Yield ~ 1 | Batch
##    Data: Dyestuff
## 
## REML criterion at convergence: 319.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4117 -0.7634  0.1418  0.7792  1.8296 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Batch    (Intercept) 1764     42.00   
##  Residual             2451     49.51   
## Number of obs: 30, groups:  Batch, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1527.50      19.38    78.8

Things to note:

  • The syntax Yield ~ 1 | Batch tells R to fit a model with a global intercept (1) and a random Batch effect (|Batch). More on that later.
  • As usual, summary is content aware and has a different behavior for lme class objects.
  • The output distinguishes between random effects (\(u\)), a source of variability, and fixed effect (\(\beta\)), which we want to study. The mean of the random effect is not reported because it is unassumingly 0.
  • Were we not interested in the variance components, and only in the coefficients or predictions, an (almost) equivalent lm formulation is lm(Yield ~ Batch).

Some utility functions let us query the lme object. The function coef will work, but will return a cumbersome output. Better use fixef to extract the fixed effects, and ranef to extract the random effects. The model matrix (of the fixed effects alone), can be extracted with model.matrix, and predictions made with predict. Note, however, that predictions with mixed-effect models are better treated as prediction problems as in the Supervised Learning Chapter 10, but are a very delicate matter.

detach(Dyestuff)

8.2.2 Multiple Random Effects

Let’s make things more interesting by allowing more than one random effect. One-way ANOVA can be thought of as the fixed-effects counterpart of the single random effect.

In the Penicillin data, we measured the diameter of spread of an organism, along the plate used (a to x), and penicillin type (A to F). We will now try to infer on the diameter of typical organism, and compute its variability over plates and Penicillin types.

head(Penicillin)
##   diameter plate sample
## 1       27     a      A
## 2       23     a      B
## 3       26     a      C
## 4       23     a      D
## 5       23     a      E
## 6       21     a      F

One sample per combination:

attach(Penicillin)
table(sample, plate) # how many observations per plate & type?
##       plate
## sample a b c d e f g h i j k l m n o p q r s t u v w x
##      A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      C 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

And visually:

Let’s fit a mixed-effects model with a random plate effect, and a random sample effect:

lme.2 <- lmer ( diameter ~  1  + (1|plate )+(1|sample) , Penicillin )
fixef(lme.2) # Fixed effects
## (Intercept) 
##    22.97222
ranef(lme.2) # Random effects
## $plate
##   (Intercept)
## a  0.80454389
## b  0.80454389
## c  0.18167120
## d  0.33738937
## e  0.02595303
## f -0.44120149
## g -1.37551052
## h  0.80454389
## i -0.75263783
## j -0.75263783
## k  0.96026206
## l  0.49310755
## m  1.42741658
## n  0.49310755
## o  0.96026206
## p  0.02595303
## q -0.28548332
## r -0.28548332
## s -1.37551052
## t  0.96026206
## u -0.90835601
## v -0.28548332
## w -0.59691966
## x -1.21979235
## 
## $sample
##   (Intercept)
## A  2.18705819
## B -1.01047625
## C  1.93789966
## D -0.09689498
## E -0.01384214
## F -3.00374447
## 
## with conditional variances for "plate" "sample"

Things to note:

  • The syntax 1+ (1| plate ) + (1| sample ) fits a global intercept (mean), a random plate effect, and a random sample effect.
  • Were we not interested in the variance components, an (almost) equivalent lm formulation is lm(diameter ~ plate + sample).
  • The output of ranef is somewhat controversial. Think about it: Why would we want to plot the estimates of a random variable?

Since we have two random effects, we may compute the variability of the global mean (the only fixed effect) as we did before. Perhaps more interestingly, we can compute the variability in the response, for a particular plate or sample type.

random.effect.lme2 <- ranef(lme.2, condVar = TRUE) 
qrr2 <- lattice::dotplot(random.effect.lme2, strip = FALSE)

Variability in response for each plate, over various sample types:

print(qrr2[[1]]) 

Variability in response for each sample type, over the various plates:

print(qrr2[[2]])  

Things to note:

  • The condVar argument of the ranef function tells R to compute the variability in response conditional on each random effect at a time.
  • The dotplot function, from the lattice package, is only there for the fancy plotting.

We used the penicillin example to demonstrate the incorporation of two random-effects. We could have, however, compared between penicillin types. For this matter, penicillin types are fixed effects to infer on, and not part of the uncertainty in the mean diameter. The appropriate model is the following:

lme.2.2 <- lmer( diameter ~  1  + sample + (1|plate) , Penicillin )

I may now ask myself: does the sample, i.e. penicillin, have any effect? This is what the ANOVA table typically gives us. The next table can be thought of as a “repeated measures ANOVA”:

anova(lme.2.2)
## Analysis of Variance Table
##        Df Sum Sq Mean Sq F value
## sample  5 449.22  89.844  297.09

Ugh! No p-values. Why is this? Because Doug Bates, the author of lme4 makes a strong argument against current methods of computing p-values in mixed models. If you insist on an p-value, you may recur to other packages that provide that, at your own caution:

car::Anova(lme.2.2) 
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: diameter
##         Chisq Df Pr(>Chisq)    
## sample 1485.5  5  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

… and yes; the penicillin type has a significant effect on the diameter.

8.2.3 A Full Mixed-Model

In the sleepstudy data, we recorded the reaction times to a series of tests (Reaction), after various subject (Subject) underwent various amounts of sleep deprivation (Day).

We now want to estimate the (fixed) effect of the days of sleep deprivation on response time, while allowing each subject to have his/hers own effect. Put differently, we want to estimate a random slope for the effect of day. The fixed Days effect can be thought of as the average slope over subjects.

lme.3 <- lmer ( Reaction ~ Days + ( Days | Subject ) , data= sleepstudy )

Things to note:

  • ~Days specifies the fixed effect.
  • We used the Days|Subect syntax to tell R we want to fit the model ~Days within each subject.
  • Were we fitting the model for purposes of prediction only, an (almost) equivalent lm formulation is lm(Reaction~Days*Subject).

The fixed day effect is:

fixef(lme.3)
## (Intercept)        Days 
##   251.40510    10.46729

The variability in the average response (intercept) and day effect is

ranef(lme.3)
## $Subject
##     (Intercept)        Days
## 308   2.2575329   9.1992737
## 309 -40.3942719  -8.6205161
## 310 -38.9563542  -5.4495796
## 330  23.6888704  -4.8141448
## 331  22.2585409  -3.0696766
## 332   9.0387625  -0.2720535
## 333  16.8389833  -0.2233978
## 334  -7.2320462   1.0745075
## 335  -0.3326901 -10.7524799
## 337  34.8865253   8.6290208
## 349 -25.2080191   1.1730997
## 350 -13.0694180   6.6142185
## 351   4.5777099  -3.0152825
## 352  20.8614523   3.5364062
## 369   3.2750882   0.8722876
## 370 -25.6110745   4.8222518
## 371   0.8070591  -0.9881730
## 372  12.3133491   1.2842380
## 
## with conditional variances for "Subject"

Did we really need the whole lme machinery to fit a within-subject linear regression and then average over subjects? The answer is yes. The assumptions on the distribution of random effect, namely, that they are normally distributed, allows us to pool information from one subject to another. In the words of John Tukey: “we borrow strength over subjects”. Is this a good thing? If the normality assumption is true, it certainly is. If, on the other hand, you have a lot of samples per subject, and you don’t need to “borrow strength” from one subject to another, you can simply fit within-subject linear models without the mixed-models machinery.

To demonstrate the “strength borrowing”, here is a comparison of the lme, versus the effects of fitting a linear model to each subject separately.

Here is a comparison of the random-day effect from lme versus a subject-wise linear model. They are not the same.

detach(Penicillin)

8.3 Serial Correlations

As previously stated, a hierarchical model is a very convenient way to state correlations. The hierarchical sampling scheme will always yield correlations in blocks. What is the correlation does not have a block structure? Like a smooth temporal decay for time-series, or a smooth spatial decay for geospatial data?

One way to go about, is to find a dedicated package. For instance, in the Spatio-Temporal Data task view, or the Ecological and Environmental task view. Fans of vector-auto-regression should have a look at the vars package.

Instead, we will show how to solve this matter using the nlme package. This is because nlme allows to specify both a block-covariance structure using the mixed-models framework, and the smooth parametric covariances we find in temporal and spatial data.

The nlme::Ovary data is panel data of number of ovarian follicles in different mares (female horse), at various times.

with an AR(1) temporal correlation, alongside random-effects, we take an example from the help of nlme::corAR1.

library(nlme)
head(nlme::Ovary)
## Grouped Data: follicles ~ Time | Mare
##   Mare        Time follicles
## 1    1 -0.13636360        20
## 2    1 -0.09090910        15
## 3    1 -0.04545455        19
## 4    1  0.00000000        16
## 5    1  0.04545455        13
## 6    1  0.09090910        10
fm1Ovar.lme <- nlme::lme(fixed=follicles ~ sin(2*pi*Time) + cos(2*pi*Time), 
                   data = Ovary, 
                   random = pdDiag(~sin(2*pi*Time)), 
                   correlation=corAR1() )
summary(fm1Ovar.lme)
## Linear mixed-effects model fit by REML
##  Data: Ovary 
##        AIC     BIC   logLik
##   1563.448 1589.49 -774.724
## 
## Random effects:
##  Formula: ~sin(2 * pi * Time) | Mare
##  Structure: Diagonal
##         (Intercept) sin(2 * pi * Time) Residual
## StdDev:    2.858385           1.257977 3.507053
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Mare 
##  Parameter estimate(s):
##       Phi 
## 0.5721866 
## Fixed effects: follicles ~ sin(2 * pi * Time) + cos(2 * pi * Time) 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)        12.188089 0.9436602 295 12.915760  0.0000
## sin(2 * pi * Time) -2.985297 0.6055968 295 -4.929513  0.0000
## cos(2 * pi * Time) -0.877762 0.4777821 295 -1.837159  0.0672
##  Correlation: 
##                    (Intr) s(*p*T
## sin(2 * pi * Time)  0.000       
## cos(2 * pi * Time) -0.123  0.000
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.34910093 -0.58969626 -0.04577893  0.52931186  3.37167486 
## 
## Number of Observations: 308
## Number of Groups: 11

Things to note:

  • The fitting is done with the nlme::lme function, and not lme4::lmer (which does not allow for non blocked covariance models).
  • sin(2*pi*Time) + cos(2*pi*Time) is a fixed effect that captures seasonality.
  • The temporal covariance, is specified using the correlations= argument.
  • AR(1) was assumed by calling correlation=corAR1(). See nlme::corClasses for a list of supported correlation structures.
  • From the summary, we see that a Mare random effect has also been added. Where is it specified? It is implied by the random= argument. Read ?lme for further details.

We can now inspect the contrivance implied by our model’s specification:

the.cov <- mgcv::extract.lme.cov(fm1Ovar.lme, data = Ovary) 
lattice::levelplot(the.cov)

8.4 Extensions

8.4.1 Cluster Robust Standard Errors

As previously stated, random effects are nothing more than a convenient way to specify dependencies within a level of a random effect, i.e., within a group/cluster. This is also the motivation underlying cluster robust inference, which is immensely popular with econometricians, but less so elsewhere. What is the difference between the two?

Mixed models framework is a bona-fide generalization of cluster robust inference. This author thus recommends using the lme4 and nlme packages for mixed models to deal with correlations within cluster.

For a longer comparison between the two approaches, see Michael Clarck’s guide.

8.4.2 Linear Models for Panel Data

nlme and lme4 will probably provide you with all the functionality you need for panel data. If, however, you are trained as an econometrist, prefer the econometric parlance, and are not using non-linead models, then the plm package is just for you. In particular, it allows for cluster-robust covariance estimates, and Durbin–Wu–Hausman test for random effects. The plm package vignette also has a comparison to the nlme package.

8.4.3 Testing Hypotheses on Correlations

After working so hard to model the correlations in observation, we may want to test if it was all required. Douglas Bates, the author of nlme and lme4 wrote a famous cautionary note, found here, on hypothesis testing in mixed models. Many practitioners, however, do not adopt Doug’s view. Many of the popular tests, particularly the ones in the econometric literature, can be found in the plm package (see Section 6 in the package vignette). These include tests for poolability, Hausman test, tests for serial correlations, tests for cross-sectional dependence, and unit root tests.

8.5 Relation to Other Estimators

8.5.1 Fixed Effects in the Econometric Literature

Fixed effects in the statistical literature, as discussed herein, are different than those in the econometric literature. See Section 7 of the plm package vignette for a comparison.

8.5.2 Relation to Generalized Least Squares (GLS)

GLS is the solution to a decorrelated least squares problem: \[\hat{\beta}_{GLS}:=argmin_\beta\{(X'\beta-y)'\Sigma^{-1}(X'\beta-y)'\}.\] This estimator can be viewed as a least squares estimator that accounts for correlations in the data. It is also a maximum likelihood estimator under a Gaussian error assumption. Viewed as the latter, then linear mixed models under a Gaussian error assumption, collapses to a GLS estimator.

8.5.3 Relation to Conditional Gaussian Fields

In the geo-spatial literature, geo-located measurements are typically assumed to be sampled from a Gaussian Random Field. All the models discussed in this chapter can be stated in terms of these random fields. In the random field nomenclature, the fixed effects are known as the drift, or the mean field, and the covariance in errors is known as the correlation function. In other fields of literature the correlation function is known as a charachteristic function, radial basis functions, or kernel. Assuming stationarity, these simplify to the power spectrum via the Wiener–Khinchin theorem. The predictions of such models may be found under the names of linear projection operators, best linear unbiased prediction, Kriging, radial basis function interpolators.

8.5.4 Relation to Empirical Risk Minimization (ERM)

ERM is more general than mixed-models estimation since it allows loss functions that are not the (log) likelihood. ERM is less general than LMM, in that ERM (typically) does not account for correlations in the data.

8.5.5 Relation to M-Estimation

M-estimation is term in the statistical literature for ERM.

8.5.6 Relation to Generalize Estimating Equations (GEE)

The first order condition of the LMM problem returns a set of (non-linear) estimating equations. In this sense, GEE can be seen as more general than LMM in that the GEE need not be the derivative of the (log) likelihood.

8.5.7 Relation to MANOVA

Multivariate analysis of variance (MANOVA) deals with the estimation of effect on vector valued outcomes. Put differently: in ANOVA the response, \(y\), is univariate. In MANOVA, the outcome is multivariate. MANOVA is useful when there are correlations among the entries of \(y\). Otherwise- one may simply solve many ANOVA problems, instead of a single MANOVA.

Now assume that the outcome of a MANOVA is measurements of an individual at several time periods. The measurements are clearly correlated, so that MANOVA may be useful. But one may also treat the subject as a random effect, with a univariate response. We thus see that this seemingly MANOVA problem can be solved with the mixed models framework.

What MANOVA problems cannot be solved with mixed models? There may be cases where the covariance of the multivariate outcome, \(y\), is very complicated. If the covariance in \(y\) may not be stated using a combination of random and fixed effects, then the covariance has to be stated explicitly. It is also possible to consider mixed-models with multivariate outcomes, i.e., a mixed MANOVA, or hirarchial MANOVA. The R functions we present herein permit this.

8.5.8 Relation to Seemingly Unrelated Equations (SUR)

SUR is the econometric term for MANOVA.

8.6 Bibliographic Notes

Most of the examples in this chapter are from the documentation of the lme4 package (Bates et al. 2015). For a general and very applied treatment, see Pinero and Bates (2000). As usual, a hands on view can be found in Venables and Ripley (2013), and also in an excellent blog post by Kristoffer Magnusson For a more theoretical view see Weiss (2005) or Searle, Casella, and McCulloch (2009). Sometimes it is unclear if an effect is random or fixed; on the difference between the two types of inference see the classics: Eisenhart (1947), Kempthorne (1975), and the more recent Rosset and Tibshirani (2018). For more on predictions in linear mixed models see Robinson (1991), Rabinowicz and Rosset (2018), and references therein. See Michael Clarck’s guide for various ways of dealing with correlations within groups. For the geo-spatial view and terminology of correlated data, see Christakos (2000), Diggle, Tawn, and Moyeed (1998), Allard (2013), and Cressie and Wikle (2015).

8.7 Practice Yourself

  1. Computing the variance of the sample mean given dependent correlations. How does it depend on the covariance between observations? When is the sample most informative on the population mean?

  2. Return to the Penicillin data set. Instead of fitting an LME model, fit an LM model with lm. I.e., treat all random effects as fixed.
    1. Compare the effect estimates.
    2. Compare the standard errors.
    3. Compare the predictions of the two models.
  3. [Very Advanced!] Return to the Penicillin data and use the gls function to fit a generalized linear model, equivalent to the LME model in our text.
  4. Read about the “oats” dataset using ? MASS::oats.Inspect the dependency of the yield (Y) in the Varieties (V) and the Nitrogen treatment (N).
    1. Fit a linear model, does the effect of the treatment significant? The interaction between the Varieties and Nitrogen is significant?
    2. An expert told you that could be a variance between the different blocks (B) which can bias the analysis. fit a LMM for the data.
    3. Do you think the blocks should be taken into account as “random effect” or “fixed effect”?
  5. Return to the temporal correlation in Section 8.3, and replace the AR(1) covariance, with an ARMA covariance. Visualize the data’s covariance matrix, and compare the fitted values.

See DataCamps’ Hierarchical and Mixed Effects Models for more self practice.

References

Weiss, Robert E. 2005. Modeling Longitudinal Data. Springer Science & Business Media.

Bonat, Wagner Hugo. 2018. “Multiple Response Variables Regression Models in R: The Mcglm Package.” Journal of Statistical Software 84 (1): 1–30.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. doi:10.18637/jss.v067.i01.

Pinero, Jose, and Douglas Bates. 2000. “Mixed-Effects Models in S and S-Plus (Statistics and Computing).” Springer, New York.

Venables, William N, and Brian D Ripley. 2013. Modern Applied Statistics with S-Plus. Springer Science & Business Media.

Searle, Shayle R, George Casella, and Charles E McCulloch. 2009. Variance Components. Vol. 391. John Wiley & Sons.

Eisenhart, Churchill. 1947. “The Assumptions Underlying the Analysis of Variance.” Biometrics 3 (1). JSTOR: 1–21.

Kempthorne, Oscar. 1975. “Fixed and Mixed Models in the Analysis of Variance.” Biometrics. JSTOR, 473–86.

Rosset, Saharon, and Ryan J Tibshirani. 2018. “From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation.” Journal of the American Statistical Association, no. just-accepted. Taylor & Francis.

Robinson, George K. 1991. “That Blup Is a Good Thing: The Estimation of Random Effects.” Statistical Science. JSTOR, 15–32.

Rabinowicz, Assaf, and Saharon Rosset. 2018. “Assessing Prediction Error at Interpolation and Extrapolation Points.” arXiv Preprint arXiv:1802.00996.

Christakos, George. 2000. Modern Spatiotemporal Geostatistics. Vol. 6. Oxford University Press.

Diggle, Peter J, JA Tawn, and RA Moyeed. 1998. “Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (3). Wiley Online Library: 299–350.

Allard, Denis. 2013. “J.-P. Chiles, P. Delfiner: Geostatistics: Modeling Spatial Uncertainty.” Springer.

Cressie, Noel, and Christopher K Wikle. 2015. Statistics for Spatio-Temporal Data. John Wiley; Sons.


  1. A.k.a. the cluster effect.