The R in this document reproduces the results in the lecture on multilevel logistic regression. The data are from Agresti (2002) who got is from Koch et al. (1977)
The package that we’ll use
library(lme4)
library(lmerTest)
library(gee) # for generalized estimating equations
library(MASS) # For confidence limits for parameters
Read in the data and examine
depress <- read.table(file="D:/Dropbox/edps587/lectures/10 logreg/depression_data_long.txt", header=TRUE)
head(depress)
## id time severe Rx y
## 1 1 0 0 0 1
## 2 1 1 0 0 1
## 3 1 2 0 0 1
## 4 2 0 0 0 1
## 5 2 1 0 0 1
## 6 2 2 0 0 1
Note that there are 3 observations per subject, one for each time point.
The very nieve approach to logistic regression for clustered data is below
# standard logistic regression
model1a <- glm(y ~ severe + Rx + time + Rx*time, data=depress, family=binomial)
summary(model1a)
##
## Call:
## glm(formula = y ~ severe + Rx + time + Rx * time, family = binomial,
## data = depress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4352 -1.0220 0.3254 0.9915 1.8009
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.02799 0.16391 -0.171 0.864
## severe -1.31391 0.14641 -8.974 < 2e-16 ***
## Rx -0.05960 0.22221 -0.268 0.789
## time 0.48241 0.11476 4.204 2.63e-05 ***
## Rx:time 1.01744 0.18879 5.389 7.08e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1411.9 on 1019 degrees of freedom
## Residual deviance: 1161.9 on 1015 degrees of freedom
## AIC: 1171.9
##
## Number of Fisher Scoring iterations: 4
The z’s statistics for coefficients are sqrt(Wald).
model.gee <- gee(y ~ severe + Rx + time + Rx*time, id, data=depress, family=binomial, corstr="exchangeable")
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) severe Rx time Rx:time
## -0.02798843 -1.31391092 -0.05960381 0.48241209 1.01744498
summary(model.gee)
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logit
## Variance to Mean Relation: Binomial
## Correlation Structure: Exchangeable
##
## Call:
## gee(formula = y ~ severe + Rx + time + Rx * time, id = id, data = depress,
## family = binomial, corstr = "exchangeable")
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -0.94843397 -0.40683122 0.05156603 0.38832332 0.80238627
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -0.02809866 0.1625499 -0.1728617 0.1741791 -0.1613205
## severe -1.31391033 0.1448627 -9.0700418 0.1459630 -9.0016667
## Rx -0.05926689 0.2205340 -0.2687427 0.2285569 -0.2593091
## time 0.48246420 0.1141154 4.2278625 0.1199383 4.0226037
## Rx:time 1.01719312 0.1877051 5.4191018 0.1877014 5.4192084
##
## Estimated Scale Parameter: 0.985392
## Number of Iterations: 2
##
## Working Correlation
## [,1] [,2] [,3]
## [1,] 1.000000000 -0.003432732 -0.003432732
## [2,] -0.003432732 1.000000000 -0.003432732
## [3,] -0.003432732 -0.003432732 1.000000000
model2 <- glmer(y ~ severe + Rx + time + Rx*time + (1 |id ), data=depress, family=binomial)
summary(model2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: y ~ severe + Rx + time + Rx * time + (1 | id)
## Data: depress
##
## AIC BIC logLik deviance df.resid
## 1173.9 1203.5 -581.0 1161.9 1014
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2849 -0.8268 0.2326 0.7964 2.0181
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.003231 0.05684
## Number of obs: 1020, groups: id, 340
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.02796 0.16407 -0.170 0.865
## severe -1.31488 0.15262 -8.615 < 2e-16 ***
## Rx -0.05967 0.22240 -0.268 0.788
## time 0.48274 0.11566 4.174 3.00e-05 ***
## Rx:time 1.01817 0.19150 5.317 1.06e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) severe Rx time
## severe -0.389
## Rx -0.614 -0.005
## time -0.673 -0.123 0.524
## Rx:time 0.462 -0.121 -0.742 -0.562
model3 <- glmer(y ~ severe + Rx + time + Rx*time + (1 |id ), data=depress, family=binomial, nAGQ=10)
summary(model3)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
## Family: binomial ( logit )
## Formula: y ~ severe + Rx + time + Rx * time + (1 | id)
## Data: depress
##
## AIC BIC logLik deviance df.resid
## 1173.9 1203.5 -581.0 1161.9 1014
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2835 -0.8264 0.2324 0.7963 2.0191
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.004331 0.06581
## Number of obs: 1020, groups: id, 340
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.02795 0.16412 -0.170 0.865
## severe -1.31521 0.15465 -8.504 < 2e-16 ***
## Rx -0.05970 0.22246 -0.268 0.788
## time 0.48284 0.11596 4.164 3.13e-05 ***
## Rx:time 1.01842 0.19241 5.293 1.20e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) severe Rx time
## severe -0.385
## Rx -0.614 -0.004
## time -0.671 -0.133 0.522
## Rx:time 0.461 -0.134 -0.740 -0.552
setwd("D:/Dropbox/edps587/lectures/10 logreg")
knitr::purl("R_markdown_depression_example.Rmd","R_script_depression_example.txt")