This document is a very brief introduction to using MCMC to fit models to data, in particualar Hamletonian sampling via Stan. Examples are included on the use of the brms package, which is a wrapper for Stan.
We will use the nels data for \(N=23\). We start by running R to get everything needed to set up for model fitting. I have included libraries in case you want to also do maximum likelihood estimation of the HLM models.
\[ f(\mathbf{\gamma},\mathbf{T},\sigma^2 | \mathbf{data}) = \frac{ f(\mathbf{data}\ | \ \mathbf{\gamma},\mathbf{T},\sigma^2)f(\mathbf{\gamma},\mathbf{T},\sigma^2) }{ f(\mathbf{data})}\]
where
We will use MCMC (Markov Chain Monte Carlo to sample from the posterior), but we’ll just work with \[f(\mathbf{parameters}\ | \ \mathbf{data}) \propto f(\mathbf{data}\ | \ \mathbf{parameters})f(\mathbf{\gamma},\mathbf{T},\sigma^2) \] A link to see what we’ll be doing.
https://chi-feng.github.io/mcmc-demo/app.html
Stan (brms) uses Hamliltonian sampling.
The brms package is a wrapper or interface that make fitting mixed (random effects) models, which makes using Stan relatively (too) easy. The hardest part is installing Stan and brms. This document gives you enough so that you can fit models using Bayesian estimation but not enough so that you understand what’s going on nor what is needed for a complete analysis. If you really want to use Bayesian, you should learn more about it.
The brms package will create input that is fed into Stan. This is easier than writing your own input and then using the rstan package in R, which is another way to run Stan. We will should how to get the Stan input from the brms.
“Fit Bayesian generalized (non-)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit – among others – linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation. References: B?rkner (2017) doi:10.18637/jss.v080.i01; B?rkner (2018) doi:10.32614/RJ-2018-017; Carpenter et al. (2017) doi:10.18637/jss.v076.i01.”
library(brms)
library(rstan)
library(lme4)
library(lmerTest)
library(coda)
library(mvtnorm)
library(lattice)
library(optimx)
Recommended options for use with stan:
# for stan
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
Read in the data and do a bit of set-up
setwd("D:/Dropbox/edps 590BAY/Lectures/10 Ham_Stan_brms")
nels <- read.table("school23_data.txt",header=TRUE)
head(nels)
## school student sex race homew schtype ses pared math classtr schsize urban
## 1 7472 3 2 4 1 1 -0.13 2 48 2 3 2
## 2 7472 8 1 4 0 1 -0.39 2 48 2 3 2
## 3 7472 13 1 4 0 1 -0.80 2 53 2 3 2
## 4 7472 17 1 4 1 1 -0.72 2 42 2 3 2
## 5 7472 27 2 4 2 1 -0.74 2 43 2 3 2
## 6 7472 28 2 4 1 1 -0.58 2 57 2 3 2
## geo minority ratio
## 1 2 0 19
## 2 2 0 19
## 3 2 0 19
## 4 2 0 19
## 5 2 0 19
## 6 2 0 19
tail(nels)
## school student sex race homew schtype ses pared math classtr schsize
## 514 72991 62 2 4 1 1 -1.01 3 55 4 7
## 515 72991 63 2 1 5 1 0.63 5 63 4 7
## 516 72991 70 2 4 4 1 0.56 4 67 4 7
## 517 72991 77 2 4 1 1 0.46 3 53 4 7
## 518 72991 87 1 4 1 1 0.06 3 56 4 7
## 519 72991 96 2 4 2 1 -0.05 2 53 4 7
## urban geo minority ratio
## 514 2 1 2 13
## 515 2 1 2 13
## 516 2 1 2 13
## 517 2 1 2 13
## 518 2 1 2 13
## 519 2 1 2 13
# sort by schools:
nels <- nels[order(nels$school,nels$student) , ]
# Get information on number schools & create a variable "school.id" which has consecutive integers as Id for the schools.
school <- unique(nels$school) ##
N <- length(school) ## need N and school.id
school.int <- as.data.frame(cbind(school,seq(1:N))) ## for dataList & model
names(school.int) <- c("school", "school.id") ##
nels <- merge(nels,school.int,by="school") ##
# total number of students
n <- length(nels$math) ## need n for dataList & model
# You don't need this but might want it
nj <- matrix(999,nrow=N,ncol=1)
for (j in 1:N) {
nj[j] <- nrow(subset(nels,nels$school.id==j))
}
We’ll start simple (fastest of the bunch) and use default priors,
options(mc.cores = parallel::detectCores())
t0<-Sys.time()
summary(model.simple <- brm(math~ 1 , data=nels))
## Compiling Stan program...
## Start sampling
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: math ~ 1
## Data: nels (Number of observations: 519)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 51.72 0.47 50.78 52.58 1.00 3676 2864
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 10.72 0.33 10.10 11.39 1.00 3656 2253
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
(tfinal<-Sys.time()-t0)
## Time difference of 1.009929 mins
plot(model.simple)
We’ll talk about the output in lecture.
Note that this take a while and a considerable amount of time is spent getting the model into C++ (i.e., “Compiling Stan program”).
More diagnostic plots but using coda
# Change class of model output
simple.mcmc <- as.mcmc(model.simple)
traceplot(simple.mcmc)
densplot(simple.mcmc)
gelman.diag(simple.mcmc)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## b_Intercept 1 1.00
## sigma 1 1.00
## lp__ 1 1.01
##
## Multivariate psrf
##
## 1
geweke.diag(simple.mcmc, frac1=0.1, frac2=0.5)
## [[1]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept sigma lp__
## -1.1413 -0.3989 -0.1267
##
##
## [[2]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept sigma lp__
## -0.1346 -1.0393 0.5007
##
##
## [[3]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept sigma lp__
## -1.0693 0.1975 0.8366
##
##
## [[4]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept sigma lp__
## 0.5257 -2.2405 0.2610
autocorr.plot(simple.mcmc)
We can obtain posterior predictive checks see brms documentation for different types. Here are some examples (they might look better with a more reasonable model)
pp_check(model.simple)
## Using 10 posterior samples for ppc type 'dens_overlay' by default.
pp_check(model.simple, type="error_hist")
## Using 10 posterior samples for ppc type 'error_hist' by default.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
pp_check(model.simple, type="ecdf_overlay")
## Using 10 posterior samples for ppc type 'ecdf_overlay' by default.
Before going on to more interesting models, we’ll take a look at what brms is doing for you. This can be useful for
Simulation studies (faster ones)
Continue to learn the Stan language
To see the Stan code that brms produced
(simple.stan <-stancode(model.simple))
## // generated with brms 2.14.4
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## vector[N] Y; // response variable
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
## real Intercept; // temporary intercept for centered predictors
## real<lower=0> sigma; // residual SD
## }
## transformed parameters {
## }
## model {
## // likelihood including all constants
## if (!prior_only) {
## // initialize linear predictor term
## vector[N] mu = Intercept + rep_vector(0.0, N);
## target += normal_lpdf(Y | mu, sigma);
## }
## // priors including all constants
## target += student_t_lpdf(Intercept | 3, 51, 13.3);
## target += student_t_lpdf(sigma | 3, 0, 13.3)
## - 1 * student_t_lccdf(0 | 3, 0, 13.3);
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept;
## }
Take a look at this and note what the default priors are.
We can also get the data list used with this stan code by using
dataList0 <- standata(model.simple)
Every time we want to or need to run this model, we would have to recompile the model; however, we can take the Stan code and create a compiled version of it. If we have such a compile version, we can re-use the model with another data set. This would be a great time saver if you’re running a simulation study or you want to fit the model to different data sets. For, example I could use if for the anorexia data (change data list of course).
I made a few changes that I though might speed things up; in particular, getting ride of the “if” statement. I also got ride of the temporary intercept. You can also changes the priors (if you want or could set them before running brms using the “set_prior” command.
To compile the model without sampling use the command “stan_model”,
m.0 <- stan_model(model_code = "data {
int<lower=1> N; // number of observations
vector[N] Y; // response variable
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real b_Intercept; // temporary intercept
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
vector[N] mu = b_Intercept + rep_vector(0, N);
// priors including all constants
target += student_t_lpdf(b_Intercept | 3, 51, 13);
target += student_t_lpdf(sigma | 3, 0, 13)
- 1 * student_t_lccdf(0 | 3, 0, 13);
// likelihood including all constants
target += normal_lpdf(Y | mu, sigma);
}
"
)
I can now use and re-use this model, using the command “sampling” and include all the options I would normaly include when running stan. For the nels data,
# Now do sampling using the nels data list
model.0 <- sampling(m.0,
data=dataList0,
chains=4,
core=4,
)
stan_plot(model.0)
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
Notice that sampling is fast compared to creating the C++ version of our model.
If you want more for diagnostic graphics from Stan
stan_trace(model.0) # trace plots
stan_dens(model.0) # density plots for specified parameters
stan_hist(model.0) # histograms
stan_ac(model.0) # auto-correlation
Using MLE, we show the MLE results only because you are familiar with this. If you are using Bayesian, then skip this.
modelRI.lmer <- lmer(math~ 1 + homew + (1 |school.id), data=nels, REML=TRUE,
control = lmerControl(optimizer ="Nelder_Mead"))
summary(modelRI.lmer)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ 1 + homew + (1 | school.id)
## Data: nels
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 3729.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5940 -0.7066 0.0052 0.6613 3.2075
##
## Random effects:
## Groups Name Variance Std.Dev.
## school.id (Intercept) 21.34 4.620
## Residual 71.28 8.443
## Number of obs: 519, groups: school.id, 23
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 46.3558 1.1628 33.0122 39.866 <2e-16 ***
## homew 2.3999 0.2772 512.8988 8.658 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## homew -0.437
And using brms, the model formula is the same,
modelRI.brm <- brm(math ~ homew + (1 | school.id),
data=nels,
family=brmsfamily("gaussian"),
save_all_pars=TRUE,
cores=4,
silent=FALSE)
## Warning: Argument 'save_all_pars' is deprecated. Please use argument 'all' in
## function 'save_pars()' instead.
## Compiling Stan program...
## Start sampling
Minimal input only includes the model and data.
Explained in class:
plot(modelRI.brm)
modelRI <- as.mcmc(modelRI.brm)
traceplot(modelRI)
densplot(modelRI)
geweke.diag(modelRI, frac1=0.1, frac2=0.5)
## [[1]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept b_homew sd_school.id__Intercept
## 0.97028 0.30464 -0.08452
## sigma Intercept r_school.id[1,Intercept]
## -3.46903 1.12001 -0.89336
## r_school.id[2,Intercept] r_school.id[3,Intercept] r_school.id[4,Intercept]
## -0.32154 -0.29856 -0.84777
## r_school.id[5,Intercept] r_school.id[6,Intercept] r_school.id[7,Intercept]
## -1.28227 -0.97646 -0.89398
## r_school.id[8,Intercept] r_school.id[9,Intercept] r_school.id[10,Intercept]
## -0.76426 -1.51071 -1.83574
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept]
## -1.56588 -2.12928 -1.13220
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept]
## -1.15783 -0.45452 -1.60276
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept]
## -1.29500 -1.43296 -0.91722
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept]
## -1.86383 -2.11517 -1.26050
## r_school.id[23,Intercept] lp__ z_1[1,1]
## -0.57554 0.26894 -0.79602
## z_1[1,2] z_1[1,3] z_1[1,4]
## -0.21136 -0.07293 -1.09097
## z_1[1,5] z_1[1,6] z_1[1,7]
## -1.55298 -0.82746 -0.86972
## z_1[1,8] z_1[1,9] z_1[1,10]
## -0.68979 -1.77116 -1.47741
## z_1[1,11] z_1[1,12] z_1[1,13]
## -1.23322 -1.85005 -1.12336
## z_1[1,14] z_1[1,15] z_1[1,16]
## -1.18124 -0.68887 -1.63651
## z_1[1,17] z_1[1,18] z_1[1,19]
## -0.99997 -0.81145 -1.12679
## z_1[1,20] z_1[1,21] z_1[1,22]
## -1.89786 -2.14401 -1.22869
## z_1[1,23]
## -0.88554
##
##
## [[2]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept b_homew sd_school.id__Intercept
## -1.05941 0.27995 -0.15081
## sigma Intercept r_school.id[1,Intercept]
## -0.55221 -0.77898 0.78193
## r_school.id[2,Intercept] r_school.id[3,Intercept] r_school.id[4,Intercept]
## 1.29278 -0.11314 0.31526
## r_school.id[5,Intercept] r_school.id[6,Intercept] r_school.id[7,Intercept]
## 0.46940 0.81066 0.79442
## r_school.id[8,Intercept] r_school.id[9,Intercept] r_school.id[10,Intercept]
## 0.74701 1.86655 0.79157
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept]
## 0.91057 2.38021 0.51232
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept]
## 0.26152 0.82464 0.44103
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept]
## -0.12299 1.12793 0.64204
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept]
## 1.03908 1.16799 0.67659
## r_school.id[23,Intercept] lp__ z_1[1,1]
## 0.23998 0.51240 0.62908
## z_1[1,2] z_1[1,3] z_1[1,4]
## 1.51601 -0.11982 0.43118
## z_1[1,5] z_1[1,6] z_1[1,7]
## 0.36501 0.99540 1.09792
## z_1[1,8] z_1[1,9] z_1[1,10]
## 0.56618 1.95949 0.64570
## z_1[1,11] z_1[1,12] z_1[1,13]
## 0.79653 2.04416 0.35262
## z_1[1,14] z_1[1,15] z_1[1,16]
## 0.41718 0.61698 0.61594
## z_1[1,17] z_1[1,18] z_1[1,19]
## -0.03012 0.78268 0.59286
## z_1[1,20] z_1[1,21] z_1[1,22]
## 0.89527 0.94690 0.66296
## z_1[1,23]
## 0.30602
##
##
## [[3]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept b_homew sd_school.id__Intercept
## -1.259627 0.236218 -1.008012
## sigma Intercept r_school.id[1,Intercept]
## -0.211528 -1.337321 1.445444
## r_school.id[2,Intercept] r_school.id[3,Intercept] r_school.id[4,Intercept]
## 0.766435 2.039606 1.160737
## r_school.id[5,Intercept] r_school.id[6,Intercept] r_school.id[7,Intercept]
## 0.247064 0.512342 0.725663
## r_school.id[8,Intercept] r_school.id[9,Intercept] r_school.id[10,Intercept]
## 1.387954 1.439807 -0.001807
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept]
## 0.923359 0.905400 1.321911
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept]
## 0.807240 1.557152 0.778109
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept]
## 0.121332 1.548884 1.478894
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept]
## 1.551667 0.411278 0.266533
## r_school.id[23,Intercept] lp__ z_1[1,1]
## 0.378983 -0.600226 1.869893
## z_1[1,2] z_1[1,3] z_1[1,4]
## 1.273884 1.860742 1.156809
## z_1[1,5] z_1[1,6] z_1[1,7]
## 0.029910 0.591427 0.899471
## z_1[1,8] z_1[1,9] z_1[1,10]
## 0.646359 1.893778 -0.252325
## z_1[1,11] z_1[1,12] z_1[1,13]
## 0.246796 0.963242 1.243398
## z_1[1,14] z_1[1,15] z_1[1,16]
## 1.026494 1.636478 0.694000
## z_1[1,17] z_1[1,18] z_1[1,19]
## -0.869947 1.596634 1.240794
## z_1[1,20] z_1[1,21] z_1[1,22]
## 1.277465 0.661598 0.081849
## z_1[1,23]
## 1.259285
##
##
## [[4]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## b_Intercept b_homew sd_school.id__Intercept
## 1.14259 -1.70585 0.79800
## sigma Intercept r_school.id[1,Intercept]
## 1.44473 0.74201 -0.53414
## r_school.id[2,Intercept] r_school.id[3,Intercept] r_school.id[4,Intercept]
## -1.56332 -0.59641 -1.16297
## r_school.id[5,Intercept] r_school.id[6,Intercept] r_school.id[7,Intercept]
## -1.28363 -0.59963 -0.05399
## r_school.id[8,Intercept] r_school.id[9,Intercept] r_school.id[10,Intercept]
## -0.76837 -0.78223 -1.15491
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept]
## -0.98104 0.28329 -1.11993
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept]
## -0.58757 -1.42763 0.11268
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept]
## -0.32541 -0.52723 -1.37239
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept]
## -0.65414 -0.86849 -0.58847
## r_school.id[23,Intercept] lp__ z_1[1,1]
## -1.45465 0.44143 -0.69508
## z_1[1,2] z_1[1,3] z_1[1,4]
## -1.73276 -0.52637 -1.30755
## z_1[1,5] z_1[1,6] z_1[1,7]
## -0.69244 -0.58976 -0.27989
## z_1[1,8] z_1[1,9] z_1[1,10]
## 0.24834 -1.26829 -0.85903
## z_1[1,11] z_1[1,12] z_1[1,13]
## -0.38037 0.01449 -0.84458
## z_1[1,14] z_1[1,15] z_1[1,16]
## -0.89134 -1.74423 0.28303
## z_1[1,17] z_1[1,18] z_1[1,19]
## 0.86765 -1.19187 -1.27908
## z_1[1,20] z_1[1,21] z_1[1,22]
## -0.19038 -0.99720 -0.17851
## z_1[1,23]
## -1.67768
autocorr.plot(modelRI)
summary(modelRI.brm)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: math ~ homew + (1 | school.id)
## Data: nels (Number of observations: 519)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~school.id (Number of levels: 23)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 4.89 0.94 3.39 7.02 1.00 1208 1608
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 46.35 1.24 43.93 48.79 1.00 1036 1875
## homew 2.40 0.27 1.86 2.93 1.00 5374 2797
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 8.46 0.27 7.96 9.02 1.00 5955 2846
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
And lets make this more complex by adding random slope, and some more fixed effects including a cross level interaction,
# Random intercept & slope with level 1 and level 2 predictors
nels$public <- ifelse(nels$schtype==1,1,0)
nels$public.fac <- as.factor(nels$public)
modelRIS.lmer <- lmer(math~ 1 + homew + ses + public.fac + homew*public.fac + (1 + homew|school.id),
data=nels, REML=TRUE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(modelRIS.lmer)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ 1 + homew + ses + public.fac + homew * public.fac + (1 +
## homew | school.id)
## Data: nels
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 3598.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3958 -0.6789 -0.0247 0.6389 2.9930
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## school.id (Intercept) 53.66 7.325
## homew 15.49 3.936 -0.87
## Residual 51.40 7.169
## Number of obs: 519, groups: school.id, 23
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 47.5428 2.8214 21.5157 16.851 7.07e-14 ***
## homew 2.3253 1.4741 18.6166 1.577 0.132
## ses 2.6621 0.5071 496.4469 5.250 2.26e-07 ***
## public.fac1 -0.8167 3.4978 21.7755 -0.233 0.818
## homew:public.fac1 -0.7524 1.8288 18.8471 -0.411 0.685
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) homew ses pblc.1
## homew -0.856
## ses -0.068 0.000
## public.fac1 -0.812 0.691 0.135
## hmw:pblc.f1 0.692 -0.806 -0.026 -0.856
And now for the Bayesian model,
modelRIS.brm <- brm(math~ 1 + homew + ses + public.fac + homew*public.fac + (1 + homew|school.id),
data=nels, cores=4, save_all_pars=TRUE) # fit the model
## Warning: Argument 'save_all_pars' is deprecated. Please use argument 'all' in
## function 'save_pars()' instead.
## Compiling Stan program...
## Start sampling
summary(modelRIS.brm) # summary of model information
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: math ~ 1 + homew + ses + public.fac + homew * public.fac + (1 + homew | school.id)
## Data: nels (Number of observations: 519)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~school.id (Number of levels: 23)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 7.50 1.39 5.31 10.69 1.00 1014
## sd(homew) 4.05 0.77 2.78 5.78 1.01 992
## cor(Intercept,homew) -0.83 0.09 -0.94 -0.62 1.01 1099
## Tail_ESS
## sd(Intercept) 1856
## sd(homew) 1779
## cor(Intercept,homew) 1581
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 47.68 2.88 42.15 53.31 1.00 1123 1798
## homew 2.25 1.50 -0.81 5.17 1.00 1001 1488
## ses 2.64 0.52 1.63 3.67 1.00 3540 3163
## public.fac1 -1.02 3.56 -8.10 5.94 1.00 1001 1587
## homew:public.fac1 -0.65 1.88 -4.32 3.19 1.00 986 1492
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 7.19 0.23 6.75 7.66 1.00 3287 2769
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(modelRIS.brm) # trace and density plots
bayes_R2(modelRIS.brm) # Bayes version of R2
## Estimate Est.Error Q2.5 Q97.5
## R2 0.5539493 0.0214397 0.5098393 0.593078
bayes_factor(modelRIS.brm, modelRI.brm) # Computes bayes factor
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Iteration: 7
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Estimated Bayes factor in favor of modelRIS.brm over modelRI.brm: 168854354287575016592864822.00000
Possible criteria for comparing models are (in order worst to best) Bayes Factor, DIC, waic, loo, and psis-loo.
The criterion “waic” (Widely accessible information criterion) is asymptotically equivalent to loo (Leave-one-out-validation), but loo is faster and easier to compute. Some work has shown that loo is the preferred for model comparison. For these data and models, R throws warnings when I ask for “waic” and suggests using loo instead..
loo is noisy and a solution suggested for this is to use Pareto
smoothed importance sampling LOO or PSIS-loo (Verhari, Gelman &
Gabry, “Practical Bayesian model evaluation using leave-one-out
cross-validation, http://www.stat.columbia.edu/~gelman/research/unpublished/loo_stan.pdf)
Since I didn’t get is at first, I can add this. The command below
computes PSIS-LOO and compares values for our 3 models.
We can pull these out for each model,
modelRI.brm <- add_criterion(modelRI.brm, criterion="loo")
modelRI.brm$criteria$loo
##
## Computed from 4000 by 519 log-likelihood matrix
##
## Estimate SE
## elpd_loo -1856.3 15.0
## p_loo 22.8 1.6
## looic 3712.6 30.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
modelRIS.brm <- add_criterion(modelRIS.brm, criterion="loo")
## Warning: Found 1 observations with a pareto_k > 0.7 in model 'modelRIS.brm'. It
## is recommended to set 'moment_match = TRUE' in order to perform moment matching
## for problematic observations.
modelRIS.brm$criteria$loo
##
## Computed from 4000 by 519 log-likelihood matrix
##
## Estimate SE
## elpd_loo -1780.1 14.6
## p_loo 36.4 2.3
## looic 3560.2 29.2
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 516 99.4% 652
## (0.5, 0.7] (ok) 2 0.4% 766
## (0.7, 1] (bad) 1 0.2% 163
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
But better yet, for a direct comparison use
loo(modelRI.brm, modelRIS.brm, compare=TRUE, moment_match=TRUE, save_psis=TRUE)
## Output of model 'modelRI.brm':
##
## Computed from 4000 by 519 log-likelihood matrix
##
## Estimate SE
## elpd_loo -1856.3 15.0
## p_loo 22.8 1.6
## looic 3712.6 30.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
##
## Output of model 'modelRIS.brm':
##
## Computed from 4000 by 519 log-likelihood matrix
##
## Estimate SE
## elpd_loo -1780.1 14.6
## p_loo 36.4 2.3
## looic 3560.2 29.2
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 516 99.4% 652
## (0.5, 0.7] (ok) 2 0.4% 766
## (0.7, 1] (bad) 1 0.2% 163
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## Model comparisons:
## elpd_diff se_diff
## modelRIS.brm 0.0 0.0
## modelRI.brm -76.2 11.4
Note that
elpd_loo is the Expected Log Pointwise Predictive Density, which is the sum of N individual pointwise log predictive densities.
p_loo is the difference between elpd_loo and the non-cross validity log posterior predictive density.
elpd_diff is the difference in elpl_loo for two models. If more than 2 models are compared then it is the difference relative to the model with the highest elpd_loo.
Smaller values of looic are better (just like other information
criteria), and for elpd larger is better. Note that looic equals \(-2*elpd\).For more information a good starting place is https://mc-stan.org/loo/reference/loo-glossary.html
There are lots!! We specify them in brms pretty much how we would do these using glm; however, the range of models is greater with brms than with glm. The model families (from the brms documentation 8/29/2019) include
student(link = "identity", link_sigma = "log", link_nu = "logm1")
bernoulli(link = "logit")
negbinomial(link = "log", link_shape = "log")
geometric(link = "log")
lognormal(link = "identity", link_sigma = "log")
shifted_lognormal(link = "identity", link_sigma = "log",
link_ndt = "log")skew_normal(link = "identity", link_sigma = "log",link_alpha = "identity")
exponential(link = "log")weibull(link = "log", link_shape = "log")
frechet(link = "log", link_nu = "logm1")
gen_extreme_value(link = "identity", link_sigma = "log",link_xi = "log1p")
exgaussian(link = "identity", link_sigma = "log", link_beta = "log")
wiener(link = "identity", link_bs = "log", link_ndt = "log",link_bias = "logit")
Beta(link = "logit", link_phi = "log")
dirichlet(link = "logit", link_phi = "log", refcat = NULL)
von_mises(link = "tan_half", link_kappa = "log")
asym_laplace(link = "identity", link_sigma = "log",link_quantile = "logit")
hurdle_poisson(link = "log")
hurdle_negbinomial(link = "log", link_shape = "log",link_hu = "logit")
hurdle_gamma(link = "log", link_shape = "log", link_hu = "logit")
hurdle_lognormal(link = "identity", link_sigma = "log",link_hu = "logit")
zero_inflated_beta(link = "logit", link_phi = "log",link_zi = "logit")
zero_one_inflated_beta(link = "logit", link_phi = "log",link_zoi = "logit", link_coi = "logit")
zero_inflated_poisson(link = "log", link_zi = "logit")
zero_inflated_negbinomial(link = "log", link_shape = "log",link_zi = "logit")
zero_inflated_binomial(link = "logit", link_zi = "logit")
categorical(link = "logit", refcat = NULL)
multinomial(link = "logit", refcat = NULL)
cumulative(link = "logit", link_disc = "log",threshold = c("flexible", "equidistant"))
sratio(link = "logit", link_disc = "log", threshold = c("flexible","equidistant"))
cratio(link = "logit", link_disc = "log", threshold = c("flexible","equidistant"))
acat(link = "logit", link_disc = "log", threshold = c("flexible","equidistant"))