Setup
- Read in and set up data
- Creating some new variables
Fitting some models
Compare Models Fit To Data
- Table
- Likelihood Ratio Tests Between Models
Final Remarks

R code needed to fit HLM using R is discussed and illustrated here. The code is essentially the same as in the txt file “R_hsb_code_in_notes.txt”.

The major packages that we use are lme4 and lmerTest. We will use other later in the semester, but these two are used for almost all models fit this semester.

The data used here is the High School and beyond data.

Setup

First we’ll go ahead and set our working directory (i.e., this is where the data live on your computer)

setwd("D:/Dropbox/edps587/lectures/SAS_MIXED_R")

We can also specify the packages that we’ll use. The lme4 package contains command lmer that we’ll use to fit the models to data. The output does not include tests of the fixed effects, but usng lmerTest will provide these.

library(lme4)
library(lmerTest)
library(texreg)

Read in and set up data

The data are in two files, a level 1 and a level 2. The following command reads in the level one data file.

Read in student level data

hsb1 <- read.table(file="HSB1data.txt", header=TRUE)

# Check file
head(hsb1)

##     id minority female    ses mathach
## 1 1224        0      1 -1.528   5.876
## 2 1224        0      1 -0.588  19.708
## 3 1224        0      0 -0.528  20.349
## 4 1224        0      0 -0.668   8.781
## 5 1224        0      0 -0.158  17.898
## 6 1224        0      0  0.022   4.583

Read in the level 2 data.

#
# Read in school level data
#
hsb2 <- read.table(file="HSB2data.txt", header=TRUE)

# Check file
head(hsb2)

##     id size sector pracad disclim himinty meanses
## 1 1224  842      0   0.35   1.597       0  -0.428
## 2 1288 1855      0   0.27   0.174       0   0.128
## 3 1296 1719      0   0.32  -0.137       1  -0.420
## 4 1308  716      1   0.96  -0.622       0   0.534
## 5 1317  455      1   0.95  -1.694       1   0.351
## 6 1358 1430      0   0.25   1.535       0  -0.014

Merge Level 1 and Level 2

We need to merge the level 1 (student) and level 2 (school) data frames. We need to match schools and students using the school id. The following does this.

hsb <- merge(hsb1,hsb2, by=c('id'))
names(hsb)

##  [1] "id"       "minority" "female"   "ses"      "mathach"  "size"    
##  [7] "sector"   "pracad"   "disclim"  "himinty"  "meanses"

head(hsb)

##     id minority female    ses mathach size sector pracad disclim himinty
## 1 1224        0      1 -1.528   5.876  842      0   0.35   1.597       0
## 2 1224        0      1 -0.588  19.708  842      0   0.35   1.597       0
## 3 1224        0      0 -0.528  20.349  842      0   0.35   1.597       0
## 4 1224        0      0 -0.668   8.781  842      0   0.35   1.597       0
## 5 1224        0      0 -0.158  17.898  842      0   0.35   1.597       0
## 6 1224        0      0  0.022   4.583  842      0   0.35   1.597       0
##   meanses
## 1  -0.428
## 2  -0.428
## 3  -0.428
## 4  -0.428
## 5  -0.428
## 6  -0.428

Creating some new variables

There are some variables that we would like to use, school means for ses and female (mean=proportion), as well as use these to create centered variables

###########################################################   
# Create and add mean meanminority & meanfemale to data   #
##########################################################
meanfemale  <- aggregate(female ~ id , data=hsb,  
                         FUN="mean")

names(meanfemale) <- c('id', 'meanfemale')

hsb <- merge(hsb,meanfemale, by =c('id'))

# Since meanses is already in data file
hsb$ses.centered <- hsb$ses - hsb$meanses

head(hsb)

##     id minority female    ses mathach size sector pracad disclim himinty
## 1 1224        0      1 -1.528   5.876  842      0   0.35   1.597       0
## 2 1224        0      1 -0.588  19.708  842      0   0.35   1.597       0
## 3 1224        0      0 -0.528  20.349  842      0   0.35   1.597       0
## 4 1224        0      0 -0.668   8.781  842      0   0.35   1.597       0
## 5 1224        0      0 -0.158  17.898  842      0   0.35   1.597       0
## 6 1224        0      0  0.022   4.583  842      0   0.35   1.597       0
##   meanses meanfemale ses.centered
## 1  -0.428  0.5957447        -1.10
## 2  -0.428  0.5957447        -0.16
## 3  -0.428  0.5957447        -0.10
## 4  -0.428  0.5957447        -0.24
## 5  -0.428  0.5957447         0.27
## 6  -0.428  0.5957447         0.45

If you will reuse these data and don’t want to go through all the data setup, you can save the data using

write.table(hsb, 'hsb.txt', row.names=F, na='.')

This should show up in the working directory that you set above. There are other formats that you can save data, but I like plain text.

Fitting some models

Empty/Null/random effects ANOVA

We can start with a simple random intercept model; that is our empty HLM, which is the null model which is equivalent to a random effects ANOVA.

model.null <- lmer(mathach ~ 1 + (1 | id), data=hsb, REML=FALSE)

summary(model.null)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: mathach ~ 1 + (1 | id)
##    Data: hsb
## 
##      AIC      BIC   logLik deviance df.resid 
##  47121.8  47142.4 -23557.9  47115.8     7182 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.06262 -0.75365  0.02676  0.76070  2.74184 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  8.553   2.925   
##  Residual             39.148   6.257   
## Number of obs: 7185, groups:  id, 160
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  12.6371     0.2436 157.6209   51.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will talk though what’s in the out put in lecture.

Computing ICC (and residual ICC)

For models with random intercepts we can compute an interclass correlation (or residual ICC) which interpretable as both (i) the correlation between observations within a cluster or group and (ii) the proportion of variance due to differences between groups.

There are at least 3 ways to do this 1. Brute force

# Cut and paste from model summary
(icc <- 8.553/(8.553+39.148))

## [1] 0.1793044

Extract the information to compute ICC

vars <- as.data.frame(VarCorr(model.null))[4]
total <- sum(vars)
tau00 <- vars[1,1]
(icc <- tau00/total)

## [1] 0.1793109

Use a little function that I wrote

icc.lmer <- function(modl){
  vars <- as.data.frame(VarCorr(modl))[4]
  total <- sum(vars)
  tau00 <- vars[1,1]
  icc <- tau00/total
  return(icc)
  }

# example of how to use function
(icc.lmer(model.null))

## [1] 0.1793109

Note that they all give the same result.

You don’t really need all those decimal places, so let’s go with

round(icc,digits=3)

## [1] 0.179

Simple model with one fixed effect

require(lmerTest)

summary(model.simple <- lmer(mathach ~ 1 + ses + (1 | id), data=hsb, REML=FALSE))

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: mathach ~ 1 + ses + (1 | id)
##    Data: hsb
## 
##      AIC      BIC   logLik deviance df.resid 
##  46649.0  46676.5 -23320.5  46641.0     7181 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1263 -0.7277  0.0220  0.7578  2.9186 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  4.729   2.175   
##  Residual             37.030   6.085   
## Number of obs: 7185, groups:  id, 160
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   12.6576     0.1873  149.1760   67.57   <2e-16 ***
## ses            2.3915     0.1057 6837.3052   22.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## ses 0.003

round(icc.lmer(model.simple),digits=3)

## [1] 0.113

More complex model

summary(model.complex <- lmer(mathach ~ 1 + ses.centered + minority + female +
    meanses + himinty + pracad + disclim +  sector + size + (1 | id), data=hsb, REML=FALSE))

## Warning: Some predictor variables are on very different scales: consider
## rescaling

## Warning: Some predictor variables are on very different scales: consider
## rescaling

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: mathach ~ 1 + ses.centered + minority + female + meanses + himinty +  
##     pracad + disclim + sector + size + (1 | id)
##    Data: hsb
## 
##      AIC      BIC   logLik deviance df.resid 
##  46289.5  46372.1 -23132.8  46265.5     7173 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2734 -0.7212  0.0386  0.7570  2.9654 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  1.292   1.137   
##  Residual             35.880   5.990   
## Number of obs: 7185, groups:  id, 160
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   1.131e+01  5.016e-01  1.768e+02  22.547  < 2e-16 ***
## ses.centered  1.904e+00  1.085e-01  7.053e+03  17.540  < 2e-16 ***
## minority     -2.989e+00  2.111e-01  6.363e+03 -14.158  < 2e-16 ***
## female       -1.261e+00  1.580e-01  4.997e+03  -7.979 1.82e-15 ***
## meanses       3.082e+00  4.219e-01  1.527e+02   7.305 1.44e-11 ***
## himinty       2.437e-01  3.182e-01  1.898e+02   0.766 0.444721    
## pracad        3.003e+00  7.992e-01  1.586e+02   3.757 0.000241 ***
## disclim      -3.868e-01  1.795e-01  1.678e+02  -2.155 0.032583 *  
## sector        8.391e-01  3.863e-01  1.516e+02   2.172 0.031391 *  
## size          7.430e-04  2.139e-04  1.604e+02   3.474 0.000659 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ss.cnt minrty female meanss himnty pracad disclm sector
## ses.centerd -0.011                                                        
## minority    -0.009  0.161                                                 
## female      -0.227  0.051  0.019                                          
## meanses      0.472  0.015  0.092  0.005                                   
## himinty      0.139 -0.055 -0.331 -0.033  0.413                            
## pracad      -0.726 -0.002 -0.027  0.056 -0.587 -0.171                     
## disclim     -0.322  0.000 -0.026  0.091 -0.032 -0.052  0.221              
## sector      -0.161 -0.006 -0.046  0.027  0.008 -0.132 -0.342  0.487       
## size        -0.608 -0.004 -0.030  0.024 -0.158 -0.183  0.097 -0.031  0.274
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

round(icc.lmer(model.complex),digits=3)

## [1] 0.035

Compare Models Fit To Data

Table

screenreg(list(model.null,model.simple, model.complex),
          custom.model.names=c("Null", "One Fixed", "More Complex"),
          single.row=TRUE,
          custom.header=list("Our First HLM models in R"=1:3)
          )

## 
## =====================================================================================
##                                          Our First HLM models in R                   
##                      ----------------------------------------------------------------
##                      Null                  One Fixed             More Complex        
## -------------------------------------------------------------------------------------
## (Intercept)              12.64 (0.24) ***      12.66 (0.19) ***      11.31 (0.50) ***
## ses                                             2.39 (0.11) ***                      
## ses.centered                                                          1.90 (0.11) ***
## minority                                                             -2.99 (0.21) ***
## female                                                               -1.26 (0.16) ***
## meanses                                                               3.08 (0.42) ***
## himinty                                                               0.24 (0.32)    
## pracad                                                                3.00 (0.80) ***
## disclim                                                              -0.39 (0.18) *  
## sector                                                                0.84 (0.39) *  
## size                                                                  0.00 (0.00) ***
## -------------------------------------------------------------------------------------
## AIC                   47121.81              46649.00              46289.51           
## BIC                   47142.45              46676.52              46372.07           
## Log Likelihood       -23557.91             -23320.50             -23132.76           
## Num. obs.              7185                  7185                  7185              
## Num. groups: id         160                   160                   160              
## Var: id (Intercept)       8.55                  4.73                  1.29           
## Var: Residual            39.15                 37.03                 35.88           
## =====================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05

Likelihood Ratio Tests Between Models

The models are nested; that is, the null is a special case of the simple model, and the simple model is a special case of the more complex

anova(model.null,model.simple, model.complex)

## Data: hsb
## Models:
## model.null: mathach ~ 1 + (1 | id)
## model.simple: mathach ~ 1 + ses + (1 | id)
## model.complex: mathach ~ 1 + ses.centered + minority + female + meanses + himinty + pracad + disclim + sector + size + (1 | id)
##               npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
## model.null       3 47122 47142 -23558    47116                         
## model.simple     4 46649 46677 -23321    46641 474.81  1  < 2.2e-16 ***
## model.complex   12 46290 46372 -23133    46266 375.49  8  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If you only include one model, you will get F tests for each coefficient in the model.

Final Remarks

Warnings from lmer can sometimes be ignored such as “Warning: Some predictor variables are on very different scales: rescaling”. It is generally good to follow this advise.

Things you should not ignore are errors message such as

boundary (singular) fit: see help('isSingular')

Warning: Model failed to converge with 1 negative eigenvalue: -1.1e+01

Edps/Psych/Stat 587: Fitting Basic HLM using R

Carolyn J. Anderson

1/6/2023