The R in this document reproduces the results in the lecture on multilevel logistic regression. The data are from Agresti (2002) who got is from Koch et al. (1977)

Setup

The package that we’ll use

library(lme4)
library(lmerTest)
library(gee)    # for generalized estimating equations
library(MASS)   # For confidence limits for parameters

Read in the data and examine

depress <- read.table(file="D:/Dropbox/edps587/lectures/10 logreg/depression_data_long.txt", header=TRUE)

head(depress)
##   id time severe Rx y
## 1  1    0      0  0 1
## 2  1    1      0  0 1
## 3  1    2      0  0 1
## 4  2    0      0  0 1
## 5  2    1      0  0 1
## 6  2    2      0  0 1

Note that there are 3 observations per subject, one for each time point.

Ignore heirarchical structure

The very nieve approach to logistic regression for clustered data is below

# standard logistic regression

model1a <- glm(y ~ severe +  Rx + time + Rx*time, data=depress, family=binomial)
summary(model1a)
## 
## Call:
## glm(formula = y ~ severe + Rx + time + Rx * time, family = binomial, 
##     data = depress)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4352  -1.0220   0.3254   0.9915   1.8009  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.02799    0.16391  -0.171    0.864    
## severe      -1.31391    0.14641  -8.974  < 2e-16 ***
## Rx          -0.05960    0.22221  -0.268    0.789    
## time         0.48241    0.11476   4.204 2.63e-05 ***
## Rx:time      1.01744    0.18879   5.389 7.08e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1411.9  on 1019  degrees of freedom
## Residual deviance: 1161.9  on 1015  degrees of freedom
## AIC: 1171.9
## 
## Number of Fisher Scoring iterations: 4

The z’s statistics for coefficients are sqrt(Wald).

GEE using exchangable

model.gee <- gee(y ~ severe  + Rx + time + Rx*time, id, data=depress, family=binomial, corstr="exchangeable")
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      severe          Rx        time     Rx:time 
## -0.02798843 -1.31391092 -0.05960381  0.48241209  1.01744498
summary(model.gee)
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logit 
##  Variance to Mean Relation: Binomial 
##  Correlation Structure:     Exchangeable 
## 
## Call:
## gee(formula = y ~ severe + Rx + time + Rx * time, id = id, data = depress, 
##     family = binomial, corstr = "exchangeable")
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -0.94843397 -0.40683122  0.05156603  0.38832332  0.80238627 
## 
## 
## Coefficients:
##                Estimate Naive S.E.    Naive z Robust S.E.   Robust z
## (Intercept) -0.02809866  0.1625499 -0.1728617   0.1741791 -0.1613205
## severe      -1.31391033  0.1448627 -9.0700418   0.1459630 -9.0016667
## Rx          -0.05926689  0.2205340 -0.2687427   0.2285569 -0.2593091
## time         0.48246420  0.1141154  4.2278625   0.1199383  4.0226037
## Rx:time      1.01719312  0.1877051  5.4191018   0.1877014  5.4192084
## 
## Estimated Scale Parameter:  0.985392
## Number of Iterations:  2
## 
## Working Correlation
##              [,1]         [,2]         [,3]
## [1,]  1.000000000 -0.003432732 -0.003432732
## [2,] -0.003432732  1.000000000 -0.003432732
## [3,] -0.003432732 -0.003432732  1.000000000

Random Effects

Laplace (default)

model2 <- glmer(y ~ severe +  Rx + time + Rx*time + (1 |id ), data=depress, family=binomial)
summary(model2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: y ~ severe + Rx + time + Rx * time + (1 | id)
##    Data: depress
## 
##      AIC      BIC   logLik deviance df.resid 
##   1173.9   1203.5   -581.0   1161.9     1014 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2849 -0.8268  0.2326  0.7964  2.0181 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.003231 0.05684 
## Number of obs: 1020, groups:  id, 340
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.02796    0.16407  -0.170    0.865    
## severe      -1.31488    0.15262  -8.615  < 2e-16 ***
## Rx          -0.05967    0.22240  -0.268    0.788    
## time         0.48274    0.11566   4.174 3.00e-05 ***
## Rx:time      1.01817    0.19150   5.317 1.06e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) severe Rx     time  
## severe  -0.389                     
## Rx      -0.614 -0.005              
## time    -0.673 -0.123  0.524       
## Rx:time  0.462 -0.121 -0.742 -0.562

Guass Quadrature

model3 <- glmer(y ~ severe +  Rx + time + Rx*time + (1 |id ), data=depress, family=binomial, nAGQ=10)
summary(model3)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
##  Family: binomial  ( logit )
## Formula: y ~ severe + Rx + time + Rx * time + (1 | id)
##    Data: depress
## 
##      AIC      BIC   logLik deviance df.resid 
##   1173.9   1203.5   -581.0   1161.9     1014 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2835 -0.8264  0.2324  0.7963  2.0191 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.004331 0.06581 
## Number of obs: 1020, groups:  id, 340
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.02795    0.16412  -0.170    0.865    
## severe      -1.31521    0.15465  -8.504  < 2e-16 ***
## Rx          -0.05970    0.22246  -0.268    0.788    
## time         0.48284    0.11596   4.164 3.13e-05 ***
## Rx:time      1.01842    0.19241   5.293 1.20e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) severe Rx     time  
## severe  -0.385                     
## Rx      -0.614 -0.004              
## time    -0.671 -0.133  0.522       
## Rx:time  0.461 -0.134 -0.740 -0.552

Extract R script

setwd("D:/Dropbox/edps587/lectures/10 logreg")
knitr::purl("R_markdown_depression_example.Rmd","R_script_depression_example.txt")