R code needed to fit HLM using R is discussed and illustrated here. The code is essentially the same as in the txt file “R_hsb_code_in_notes.txt”.
The major packages that we use are lme4 and lmerTest. We will use other later in the semester, but these two are used for almost all models fit this semester.
The data used here is the High School and beyond data.
First we’ll go ahead and set our working directory (i.e., this is where the data live on your computer)
setwd("D:/Dropbox/edps587/lectures/SAS_MIXED_R")
We can also specify the packages that we’ll use. The lme4 package contains command lmer that we’ll use to fit the models to data. The output does not include tests of the fixed effects, but usng lmerTest will provide these.
library(lme4)
library(lmerTest)
library(texreg)
The data are in two files, a level 1 and a level 2. The following command reads in the level one data file.
hsb1 <- read.table(file="HSB1data.txt", header=TRUE)
# Check file
head(hsb1)
## id minority female ses mathach
## 1 1224 0 1 -1.528 5.876
## 2 1224 0 1 -0.588 19.708
## 3 1224 0 0 -0.528 20.349
## 4 1224 0 0 -0.668 8.781
## 5 1224 0 0 -0.158 17.898
## 6 1224 0 0 0.022 4.583
#
# Read in school level data
#
hsb2 <- read.table(file="HSB2data.txt", header=TRUE)
# Check file
head(hsb2)
## id size sector pracad disclim himinty meanses
## 1 1224 842 0 0.35 1.597 0 -0.428
## 2 1288 1855 0 0.27 0.174 0 0.128
## 3 1296 1719 0 0.32 -0.137 1 -0.420
## 4 1308 716 1 0.96 -0.622 0 0.534
## 5 1317 455 1 0.95 -1.694 1 0.351
## 6 1358 1430 0 0.25 1.535 0 -0.014
We need to merge the level 1 (student) and level 2 (school) data frames. We need to match schools and students using the school id. The following does this.
hsb <- merge(hsb1,hsb2, by=c('id'))
names(hsb)
## [1] "id" "minority" "female" "ses" "mathach" "size"
## [7] "sector" "pracad" "disclim" "himinty" "meanses"
head(hsb)
## id minority female ses mathach size sector pracad disclim himinty
## 1 1224 0 1 -1.528 5.876 842 0 0.35 1.597 0
## 2 1224 0 1 -0.588 19.708 842 0 0.35 1.597 0
## 3 1224 0 0 -0.528 20.349 842 0 0.35 1.597 0
## 4 1224 0 0 -0.668 8.781 842 0 0.35 1.597 0
## 5 1224 0 0 -0.158 17.898 842 0 0.35 1.597 0
## 6 1224 0 0 0.022 4.583 842 0 0.35 1.597 0
## meanses
## 1 -0.428
## 2 -0.428
## 3 -0.428
## 4 -0.428
## 5 -0.428
## 6 -0.428
There are some variables that we would like to use, school means for ses and female (mean=proportion), as well as use these to create centered variables
###########################################################
# Create and add mean meanminority & meanfemale to data #
##########################################################
meanfemale <- aggregate(female ~ id , data=hsb,
FUN="mean")
names(meanfemale) <- c('id', 'meanfemale')
hsb <- merge(hsb,meanfemale, by =c('id'))
# Since meanses is already in data file
hsb$ses.centered <- hsb$ses - hsb$meanses
head(hsb)
## id minority female ses mathach size sector pracad disclim himinty
## 1 1224 0 1 -1.528 5.876 842 0 0.35 1.597 0
## 2 1224 0 1 -0.588 19.708 842 0 0.35 1.597 0
## 3 1224 0 0 -0.528 20.349 842 0 0.35 1.597 0
## 4 1224 0 0 -0.668 8.781 842 0 0.35 1.597 0
## 5 1224 0 0 -0.158 17.898 842 0 0.35 1.597 0
## 6 1224 0 0 0.022 4.583 842 0 0.35 1.597 0
## meanses meanfemale ses.centered
## 1 -0.428 0.5957447 -1.10
## 2 -0.428 0.5957447 -0.16
## 3 -0.428 0.5957447 -0.10
## 4 -0.428 0.5957447 -0.24
## 5 -0.428 0.5957447 0.27
## 6 -0.428 0.5957447 0.45
If you will reuse these data and don’t want to go through all the data setup, you can save the data using
write.table(hsb, 'hsb.txt', row.names=F, na='.')
This should show up in the working directory that you set above. There are other formats that you can save data, but I like plain text.
We can start with a simple random intercept model; that is our empty HLM, which is the null model which is equivalent to a random effects ANOVA.
model.null <- lmer(mathach ~ 1 + (1 | id), data=hsb, REML=FALSE)
summary(model.null)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: mathach ~ 1 + (1 | id)
## Data: hsb
##
## AIC BIC logLik deviance df.resid
## 47121.8 47142.4 -23557.9 47115.8 7182
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.06262 -0.75365 0.02676 0.76070 2.74184
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 8.553 2.925
## Residual 39.148 6.257
## Number of obs: 7185, groups: id, 160
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 12.6371 0.2436 157.6209 51.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We will talk though what’s in the out put in lecture.
For models with random intercepts we can compute an interclass correlation (or residual ICC) which interpretable as both (i) the correlation between observations within a cluster or group and (ii) the proportion of variance due to differences between groups.
There are at least 3 ways to do this 1. Brute force
# Cut and paste from model summary
(icc <- 8.553/(8.553+39.148))
## [1] 0.1793044
vars <- as.data.frame(VarCorr(model.null))[4]
total <- sum(vars)
tau00 <- vars[1,1]
(icc <- tau00/total)
## [1] 0.1793109
icc.lmer <- function(modl){
vars <- as.data.frame(VarCorr(modl))[4]
total <- sum(vars)
tau00 <- vars[1,1]
icc <- tau00/total
return(icc)
}
# example of how to use function
(icc.lmer(model.null))
## [1] 0.1793109
Note that they all give the same result.
You don’t really need all those decimal places, so let’s go with
round(icc,digits=3)
## [1] 0.179
require(lmerTest)
summary(model.simple <- lmer(mathach ~ 1 + ses + (1 | id), data=hsb, REML=FALSE))
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: mathach ~ 1 + ses + (1 | id)
## Data: hsb
##
## AIC BIC logLik deviance df.resid
## 46649.0 46676.5 -23320.5 46641.0 7181
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1263 -0.7277 0.0220 0.7578 2.9186
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 4.729 2.175
## Residual 37.030 6.085
## Number of obs: 7185, groups: id, 160
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 12.6576 0.1873 149.1760 67.57 <2e-16 ***
## ses 2.3915 0.1057 6837.3052 22.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## ses 0.003
round(icc.lmer(model.simple),digits=3)
## [1] 0.113
summary(model.complex <- lmer(mathach ~ 1 + ses.centered + minority + female +
meanses + himinty + pracad + disclim + sector + size + (1 | id), data=hsb, REML=FALSE))
## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: mathach ~ 1 + ses.centered + minority + female + meanses + himinty +
## pracad + disclim + sector + size + (1 | id)
## Data: hsb
##
## AIC BIC logLik deviance df.resid
## 46289.5 46372.1 -23132.8 46265.5 7173
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2734 -0.7212 0.0386 0.7570 2.9654
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 1.292 1.137
## Residual 35.880 5.990
## Number of obs: 7185, groups: id, 160
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.131e+01 5.016e-01 1.768e+02 22.547 < 2e-16 ***
## ses.centered 1.904e+00 1.085e-01 7.053e+03 17.540 < 2e-16 ***
## minority -2.989e+00 2.111e-01 6.363e+03 -14.158 < 2e-16 ***
## female -1.261e+00 1.580e-01 4.997e+03 -7.979 1.82e-15 ***
## meanses 3.082e+00 4.219e-01 1.527e+02 7.305 1.44e-11 ***
## himinty 2.437e-01 3.182e-01 1.898e+02 0.766 0.444721
## pracad 3.003e+00 7.992e-01 1.586e+02 3.757 0.000241 ***
## disclim -3.868e-01 1.795e-01 1.678e+02 -2.155 0.032583 *
## sector 8.391e-01 3.863e-01 1.516e+02 2.172 0.031391 *
## size 7.430e-04 2.139e-04 1.604e+02 3.474 0.000659 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ss.cnt minrty female meanss himnty pracad disclm sector
## ses.centerd -0.011
## minority -0.009 0.161
## female -0.227 0.051 0.019
## meanses 0.472 0.015 0.092 0.005
## himinty 0.139 -0.055 -0.331 -0.033 0.413
## pracad -0.726 -0.002 -0.027 0.056 -0.587 -0.171
## disclim -0.322 0.000 -0.026 0.091 -0.032 -0.052 0.221
## sector -0.161 -0.006 -0.046 0.027 0.008 -0.132 -0.342 0.487
## size -0.608 -0.004 -0.030 0.024 -0.158 -0.183 0.097 -0.031 0.274
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
round(icc.lmer(model.complex),digits=3)
## [1] 0.035
screenreg(list(model.null,model.simple, model.complex),
custom.model.names=c("Null", "One Fixed", "More Complex"),
single.row=TRUE,
custom.header=list("Our First HLM models in R"=1:3)
)
##
## =====================================================================================
## Our First HLM models in R
## ----------------------------------------------------------------
## Null One Fixed More Complex
## -------------------------------------------------------------------------------------
## (Intercept) 12.64 (0.24) *** 12.66 (0.19) *** 11.31 (0.50) ***
## ses 2.39 (0.11) ***
## ses.centered 1.90 (0.11) ***
## minority -2.99 (0.21) ***
## female -1.26 (0.16) ***
## meanses 3.08 (0.42) ***
## himinty 0.24 (0.32)
## pracad 3.00 (0.80) ***
## disclim -0.39 (0.18) *
## sector 0.84 (0.39) *
## size 0.00 (0.00) ***
## -------------------------------------------------------------------------------------
## AIC 47121.81 46649.00 46289.51
## BIC 47142.45 46676.52 46372.07
## Log Likelihood -23557.91 -23320.50 -23132.76
## Num. obs. 7185 7185 7185
## Num. groups: id 160 160 160
## Var: id (Intercept) 8.55 4.73 1.29
## Var: Residual 39.15 37.03 35.88
## =====================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
The models are nested; that is, the null is a special case of the simple model, and the simple model is a special case of the more complex
anova(model.null,model.simple, model.complex)
## Data: hsb
## Models:
## model.null: mathach ~ 1 + (1 | id)
## model.simple: mathach ~ 1 + ses + (1 | id)
## model.complex: mathach ~ 1 + ses.centered + minority + female + meanses + himinty + pracad + disclim + sector + size + (1 | id)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model.null 3 47122 47142 -23558 47116
## model.simple 4 46649 46677 -23321 46641 474.81 1 < 2.2e-16 ***
## model.complex 12 46290 46372 -23133 46266 375.49 8 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If you only include one model, you will get F tests for each coefficient in the model.
Warnings from lmer can sometimes be ignored such as “Warning: Some predictor variables are on very different scales: rescaling”. It is generally good to follow this advise.
Things you should not ignore are errors message such as
boundary (singular) fit: see help('isSingular')
Warning: Model failed to converge with 1 negative eigenvalue: -1.1e+01