-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathGettingStarted.Rmd
242 lines (181 loc) · 9.68 KB
/
GettingStarted.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
title: "Getting Started with Multilevel Modeling in R"
author: "Jared E. Knowles"
date: "November, 25, 2013"
output:
html_document:
keep_md: yes
---
## Jared E. Knowles
# Introduction
Analysts dealing with grouped data and complex hierarchical structures in their data ranging from
measurements nested within participants, to counties nested within states or students nested within
classrooms often find themselves in need of modeling tools to reflect this structure of their data.
In R there are two predominant ways to fit multilevel models that account for such structure in the
data. These tutorials will show the user how to use both the `lme4` package in R to fit linear and
nonlinear mixed effect models, and to use `rstan` to fit fully Bayesian multilevel models. The focus
here will be on how to fit the models in R and not the theory behind the models. For background on
multilevel modeling, see the references. [1]
This tutorial will cover getting set up and running a few basic models using `lme4` in R. Future
tutorials will cover:
* constructing varying intercept, varying slope, and varying slope and intercept models in R
* generating predictions and interpreting parameters from mixed-effect models
* generalized and non-linear multilevel models
* fully Bayesian multilevel models fit with `rstan` or other MCMC methods
# Setting up your enviRonment
Getting started with multilevel modeling in R is simple. `lme4` is the canonical package for
implementing multilevel models in R, though there are a number of packages that depend on and
enhance its feature set, including Bayesian extensions. `lme4` has been recently rewritten to
improve speed and to incorporate a C++ codebase, and as such the features of the package are
somewhat in flux. Be sure to update the package frequently.
To install `lme4`, we just run:
```{r eval=FALSE, results='hide', echo=TRUE}
# Main version
install.packages("lme4")
# Or to install the dev version
library(devtools)
install_github("lme4",user="lme4")
```
```{r setup, echo=FALSE, error=FALSE, message=FALSE, eval=TRUE, results='hide'}
library(knitr)
knitr::opts_chunk$set(dev='png', fig.width=8, fig.height=6, echo=TRUE,
message=FALSE, error=FALSE, warning=FALSE)
options(width = 100)
```
# Read in the data
Multilevel models are appropriate for a particular kind of data structure where units are nested
within groups (generally 5+ groups) and where we want to model the group structure of the data. For
our introductory example we will start with a simple example from the `lme4` documentation and
explain what the model is doing. We will use data from Jon Starkweather at the [University of North
Texas](http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/). Visit the excellent tutorial
[available here for
more.](http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/Benchmarks/LinearMixedModels_JDS_Dec2010.pdf)
```{r loadandviewdata}
library(lme4) # load library
library(arm) # convenience functions for regression in R
lmm.data <- read.table("http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module9/lmm.data.txt",
header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
#summary(lmm.data)
head(lmm.data)
```
Here we have data on the extroversion of subjects nested within classes and within schools.
# Fit the Non-Multilevel Models
Let's start by fitting a simple OLS regression of measures of openness, agreeableness, and
socialability on extroversion. Here we use the `display` function in the excellent `arm` package for
abbreviated output. Other options include `stargazer` for LaTeX typeset tables, `xtable`, or the
`ascii` package for more flexible plain text output options.
```{r nonlmermodels}
OLSexamp <- lm(extro ~ open + agree + social, data = lmm.data)
display(OLSexamp)
```
So far this model does not fit very well at all. The R model interface is quite a simple one with
the dependent variable being specified first, followed by the `~` symbol. The righ hand side,
predictor variables, are each named. Addition signs indicate that these are modeled as additive
effects. Finally, we specify that datframe on which to calculate the model. Here we use the `lm`
function to perform OLS regression, but there are many other options in R.
If we want to extract measures such as the AIC, we may prefer to fit a generalized linear model with
`glm` which produces a model fit through maximum likelihood estimation. Note that the model formula
specification is the same.
```{r nonlmerglm}
MLexamp <- glm(extro ~ open + agree + social, data=lmm.data)
display(MLexamp)
AIC(MLexamp)
```
This results in a poor model fit. Let's look at a simple varying intercept model now.
# Fit a varying intercept model
Depending on disciplinary norms, our next step might be to fit a varying intercept model using a
grouping variable such as school or classes. Using the `glm` function and the familiar formula
interface, such a fit is easy:
```{r nonlmerfixedeffect}
MLexamp.2 <- glm(extro ~ open + agree + social + class, data=lmm.data )
display(MLexamp.2)
AIC(MLexamp.2)
anova(MLexamp, MLexamp.2, test="F")
```
This is called a fixed-effects specification often. This is simply the case of fitting a separate
dummy variable as a predictor for each class. We can see this does not provide much additional model
fit. Let's see if school performs any better.
```{r nonlmerfixedeffect2}
MLexamp.3 <- glm(extro ~ open + agree + social + school, data=lmm.data )
display(MLexamp.3)
AIC(MLexamp.3)
anova(MLexamp, MLexamp.3, test="F")
```
The school effect greatly improves our model fit. However, how do we interpret these effects?
```{r effectbreakdown}
table(lmm.data$school, lmm.data$class)
```
Here we can see we have a perfectly balanced design with fifty observations in each combination of
class and school (if only data were always so nice!).
Let's try to model each of these unique cells. To do this, we fit a model and use the `:` operator
to specify the interaction between `school` and `class`.
```{r interaction}
MLexamp.4 <- glm(extro ~ open + agree + social + school:class, data=lmm.data )
display(MLexamp.4)
AIC(MLexamp.4)
```
This is very useful, but what if we want to understand both the effect of the school and the effect
of the class, as well as the effect of the schools and classes? Unfortunately, this is not easily
done with the standard `glm`.
```{r interaction2}
MLexamp.5 <- glm(extro ~ open + agree + social + school*class - 1, data=lmm.data )
display(MLexamp.5)
AIC(MLexamp.5)
```
# Exploring Random Slopes
Another alternative is to fit a separate model for each of the school and class combinations. If we
believe the relationsihp between our variables may be highly dependent on the school and class
combination, we can simply fit a series of models and explore the parameter variation among them:
```{r}
require(plyr)
modellist <- dlply(lmm.data, .(school, class), function(x)
glm(extro~ open + agree + social, data=x))
display(modellist[[1]])
display(modellist[[2]])
```
We will discuss this strategy in more depth in future tutorials including how to performan inference
on the list of models produced in this command.
# Fit a varying intercept model with lmer
Enter `lme4`. While all of the above techniques are valid approaches to this problem, they are not
necessarily the best approach when we are interested explicitly in variation among and by groups.
This is where a mixed-effect modeling framework is useful. Now we use the `lmer` function with the
familiar formula interface, but now group level variables are specified using a special syntax:
`(1|school)` tells `lmer` to fit a linear model with a varying-intercept group effect using the
variable `school`.
```{r lmer1}
MLexamp.6 <- lmer(extro ~ open + agree + social + (1|school), data=lmm.data)
display(MLexamp.6)
```
We can fit multiple group effects with multiple group effect terms.
```{r lmer2}
MLexamp.7 <- lmer(extro ~ open + agree + social + (1|school) + (1|class), data=lmm.data)
display(MLexamp.7)
```
And finally, we can fit nested group effect terms through the following syntax:
```{r lmer3}
MLexamp.8 <- lmer(extro ~ open + agree + social + (1|school/class), data=lmm.data)
display(MLexamp.8)
```
Here the `(1|school/class)` says that we want to fit a mixed effect term for varying intercepts `1|`
by schools, and for classes that are nested within schools.
# Fit a varying slope model with lmer
But, what if we want to explore the effect of different student level indicators as they vary across
classrooms. Instead of fitting unique models by school (or school/class) we can fit a varying slope
model. Here we modify our random effect term to include variables before the grouping terms:
`(1 +open|school/class)` tells R to fit a varying slope and varying intercept model for schools and
classes nested within schools, and to allow the slope of the `open` variable to vary by school.
```{r lmer4}
MLexamp.9 <- lmer(extro ~ open + agree + social + (1+open|school/class), data=lmm.data)
display(MLexamp.9)
```
# Conclusion
Fitting mixed effect models and exploring group level variation is very easy within the R language
and ecosystem. In future tutorials we will explore comparing across models, doing inference with
mixed-effect models, and creating graphical representations of mixed effect models to understand
their effects.
# Appendix
```{r sessioninfo}
print(sessionInfo(),locale=FALSE)
```
[1] Examples include [Gelman and Hill](http://stat.columbia.edu/~gelman/arm),
[Gelman et al. 2013](http://stat.columbia.edu/~gelman/book/), etc.