Bayesian Estimation
The brms package
From brms documentation:
Set up
Simple brms for mean and variance
- Stan input produced by brms
Random intercept model
- Random intercept via lmer
- Random intercept via brms
  - Some diagnostics for Bayesian
  - Summary table for Random Intercept
More complex model
- via lmer
- via brms
Comparing Models
Other model families

This document is a very brief introduction to using MCMC to fit models to data, in particualar Hamletonian sampling via Stan. Examples are included on the use of the brms package, which is a wrapper for Stan.

We will use the nels data for \(N=23\). We start by running R to get everything needed to set up for model fitting. I have included libraries in case you want to also do maximum likelihood estimation of the HLM models.

Bayesian Estimation

\[ f(\mathbf{\gamma},\mathbf{T},\sigma^2 | \mathbf{data}) = \frac{ f(\mathbf{data}\ | \ \mathbf{\gamma},\mathbf{T},\sigma^2)f(\mathbf{\gamma},\mathbf{T},\sigma^2) }{ f(\mathbf{data})}\]

where

Posterior (target) distribution is \(f(\mathbf{prameters}\ | \ \mathbf{data})\). We want this!
Likelihood (or data model) is \(f(\mathbf{data}\ | \ \mathbf{parameters})\). This is distribution for data given parameters.
The prior distribution is \(f(\mathbf{parameters})\). This reflects prior knowledge.
The data distribution \(f(\mathbf{data}\)

We will use MCMC (Markov Chain Monte Carlo to sample from the posterior), but we’ll just work with \[f(\mathbf{parameters}\ | \ \mathbf{data}) \propto f(\mathbf{data}\ | \ \mathbf{parameters})f(\mathbf{\gamma},\mathbf{T},\sigma^2) \] A link to see what we’ll be doing.

https://chi-feng.github.io/mcmc-demo/app.html

Stan (brms) uses Hamliltonian sampling.

The brms package

The brms package is a wrapper or interface that make fitting mixed (random effects) models, which makes using Stan relatively (too) easy. The hardest part is installing Stan and brms. This document gives you enough so that you can fit models using Bayesian estimation but not enough so that you understand what’s going on nor what is needed for a complete analysis. If you really want to use Bayesian, you should learn more about it.

The brms package will create input that is fed into Stan. This is easier than writing your own input and then using the rstan package in R, which is another way to run Stan. We will should how to get the Stan input from the brms.

From brms documentation:

“Fit Bayesian generalized (non-)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit – among others – linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation. References: B?rkner (2017) doi:10.18637/jss.v080.i01; B?rkner (2018) doi:10.32614/RJ-2018-017; Carpenter et al. (2017) doi:10.18637/jss.v076.i01.”

Set up

library(brms)
library(rstan)
library(lme4)
library(lmerTest)
library(coda)
library(mvtnorm)
library(lattice)
library(optimx)

Recommended options for use with stan:

# for stan
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

Read in the data and do a bit of set-up

setwd("D:/Dropbox/edps 590BAY/Lectures/10 Ham_Stan_brms")
nels <- read.table("school23_data.txt",header=TRUE)
head(nels)

##   school student sex race homew schtype   ses pared math classtr schsize urban
## 1   7472       3   2    4     1       1 -0.13     2   48       2       3     2
## 2   7472       8   1    4     0       1 -0.39     2   48       2       3     2
## 3   7472      13   1    4     0       1 -0.80     2   53       2       3     2
## 4   7472      17   1    4     1       1 -0.72     2   42       2       3     2
## 5   7472      27   2    4     2       1 -0.74     2   43       2       3     2
## 6   7472      28   2    4     1       1 -0.58     2   57       2       3     2
##   geo minority ratio
## 1   2        0    19
## 2   2        0    19
## 3   2        0    19
## 4   2        0    19
## 5   2        0    19
## 6   2        0    19

tail(nels)

##     school student sex race homew schtype   ses pared math classtr schsize
## 514  72991      62   2    4     1       1 -1.01     3   55       4       7
## 515  72991      63   2    1     5       1  0.63     5   63       4       7
## 516  72991      70   2    4     4       1  0.56     4   67       4       7
## 517  72991      77   2    4     1       1  0.46     3   53       4       7
## 518  72991      87   1    4     1       1  0.06     3   56       4       7
## 519  72991      96   2    4     2       1 -0.05     2   53       4       7
##     urban geo minority ratio
## 514     2   1        2    13
## 515     2   1        2    13
## 516     2   1        2    13
## 517     2   1        2    13
## 518     2   1        2    13
## 519     2   1        2    13

# sort by schools:
nels <- nels[order(nels$school,nels$student) , ]

# Get information on number schools & create a variable "school.id" which has consecutive integers as Id for the schools.  
school <- unique(nels$school)                        ##  
N <- length(school)                                  ## need N and school.id
school.int <- as.data.frame(cbind(school,seq(1:N)))  ##  for dataList & model  
names(school.int) <- c("school", "school.id")        ##
nels <- merge(nels,school.int,by="school")           ##

# total number of students
n <- length(nels$math)                               ## need n for dataList & model

# You don't need this but might want it
nj <- matrix(999,nrow=N,ncol=1)
for (j in 1:N) {
  nj[j] <- nrow(subset(nels,nels$school.id==j))
}

Simple brms for mean and variance

We’ll start simple (fastest of the bunch) and use default priors,

options(mc.cores = parallel::detectCores())
t0<-Sys.time()

summary(model.simple <- brm(math~ 1  , data=nels))

## Compiling Stan program...

## Start sampling

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: math ~ 1 
##    Data: nels (Number of observations: 519) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    51.72      0.47    50.78    52.58 1.00     3676     2864
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    10.72      0.33    10.10    11.39 1.00     3656     2253
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

(tfinal<-Sys.time()-t0)

## Time difference of 1.009929 mins

plot(model.simple)

We’ll talk about the output in lecture.

Note that this take a while and a considerable amount of time is spent getting the model into C++ (i.e., “Compiling Stan program”).

More diagnostic plots but using coda

# Change class of model output
simple.mcmc <- as.mcmc(model.simple)

traceplot(simple.mcmc)

densplot(simple.mcmc)

gelman.diag(simple.mcmc)

## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## b_Intercept          1       1.00
## sigma                1       1.00
## lp__                 1       1.01
## 
## Multivariate psrf
## 
## 1

geweke.diag(simple.mcmc, frac1=0.1, frac2=0.5)

## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## b_Intercept       sigma        lp__ 
##     -1.1413     -0.3989     -0.1267 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## b_Intercept       sigma        lp__ 
##     -0.1346     -1.0393      0.5007 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## b_Intercept       sigma        lp__ 
##     -1.0693      0.1975      0.8366 
## 
## 
## [[4]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## b_Intercept       sigma        lp__ 
##      0.5257     -2.2405      0.2610

autocorr.plot(simple.mcmc)

We can obtain posterior predictive checks see brms documentation for different types. Here are some examples (they might look better with a more reasonable model)

pp_check(model.simple)

## Using 10 posterior samples for ppc type 'dens_overlay' by default.

pp_check(model.simple, type="error_hist")

## Using 10 posterior samples for ppc type 'error_hist' by default.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pp_check(model.simple, type="ecdf_overlay")

## Using 10 posterior samples for ppc type 'ecdf_overlay' by default.

Stan input produced by brms

Before going on to more interesting models, we’ll take a look at what brms is doing for you. This can be useful for

```
Simulation studies (faster ones)
```
```
Continue to learn the Stan language
```

To see the Stan code that brms produced

(simple.stan <-stancode(model.simple))

## // generated with brms 2.14.4
## functions {
## }
## data {
##   int<lower=1> N;  // total number of observations
##   vector[N] Y;  // response variable
##   int prior_only;  // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
##   real Intercept;  // temporary intercept for centered predictors
##   real<lower=0> sigma;  // residual SD
## }
## transformed parameters {
## }
## model {
##   // likelihood including all constants
##   if (!prior_only) {
##     // initialize linear predictor term
##     vector[N] mu = Intercept + rep_vector(0.0, N);
##     target += normal_lpdf(Y | mu, sigma);
##   }
##   // priors including all constants
##   target += student_t_lpdf(Intercept | 3, 51, 13.3);
##   target += student_t_lpdf(sigma | 3, 0, 13.3)
##     - 1 * student_t_lccdf(0 | 3, 0, 13.3);
## }
## generated quantities {
##   // actual population-level intercept
##   real b_Intercept = Intercept;
## }

Take a look at this and note what the default priors are.

We can also get the data list used with this stan code by using

dataList0 <- standata(model.simple)

Every time we want to or need to run this model, we would have to recompile the model; however, we can take the Stan code and create a compiled version of it. If we have such a compile version, we can re-use the model with another data set. This would be a great time saver if you’re running a simulation study or you want to fit the model to different data sets. For, example I could use if for the anorexia data (change data list of course).

I made a few changes that I though might speed things up; in particular, getting ride of the “if” statement. I also got ride of the temporary intercept. You can also changes the priors (if you want or could set them before running brms using the “set_prior” command.

To compile the model without sampling use the command “stan_model”,

m.0 <- stan_model(model_code = "data {
    int<lower=1> N;  // number of observations
    vector[N] Y;  // response variable
    int prior_only;  // should the likelihood be ignored?
    }
    transformed data {
    }
    parameters {
    real b_Intercept;  // temporary intercept
    real<lower=0> sigma;  // residual SD
    }
    transformed parameters {
    }
    model {
    vector[N] mu = b_Intercept + rep_vector(0, N);
    // priors including all constants
    target += student_t_lpdf(b_Intercept | 3, 51, 13);
    target += student_t_lpdf(sigma | 3, 0, 13)
        - 1 * student_t_lccdf(0 | 3, 0, 13);
    // likelihood including all constants
        target += normal_lpdf(Y | mu, sigma);
    }
    "
)

I can now use and re-use this model, using the command “sampling” and include all the options I would normaly include when running stan. For the nels data,

# Now do sampling using the nels data list
model.0 <- sampling(m.0, 
           data=dataList0,
                     chains=4,
                core=4,
                    )
stan_plot(model.0)

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

Notice that sampling is fast compared to creating the C++ version of our model.

If you want more for diagnostic graphics from Stan

stan_trace(model.0)   # trace plots
stan_dens(model.0)    # density plots for specified parameters
stan_hist(model.0)    # histograms
stan_ac(model.0)      # auto-correlation

Random intercept model

Random intercept via lmer

Using MLE, we show the MLE results only because you are familiar with this. If you are using Bayesian, then skip this.

modelRI.lmer <- lmer(math~ 1 + homew + (1 |school.id), data=nels, REML=TRUE, 
                    control = lmerControl(optimizer ="Nelder_Mead"))
summary(modelRI.lmer)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ 1 + homew + (1 | school.id)
##    Data: nels
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
## REML criterion at convergence: 3729.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5940 -0.7066  0.0052  0.6613  3.2075 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  school.id (Intercept) 21.34    4.620   
##  Residual              71.28    8.443   
## Number of obs: 519, groups:  school.id, 23
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  46.3558     1.1628  33.0122  39.866   <2e-16 ***
## homew         2.3999     0.2772 512.8988   8.658   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## homew -0.437

Random intercept via brms

And using brms, the model formula is the same,

modelRI.brm <- brm(math ~ homew + (1 | school.id),
         data=nels,
         family=brmsfamily("gaussian"),
         save_all_pars=TRUE,
         cores=4, 
         silent=FALSE)

## Warning: Argument 'save_all_pars' is deprecated. Please use argument 'all' in
## function 'save_pars()' instead.

## Compiling Stan program...

## Start sampling

Minimal input only includes the model and data.

Some diagnostics for Bayesian

Explained in class:

plot(modelRI.brm)

modelRI <- as.mcmc(modelRI.brm)

traceplot(modelRI)

densplot(modelRI)

geweke.diag(modelRI, frac1=0.1, frac2=0.5)

## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##               b_Intercept                   b_homew   sd_school.id__Intercept 
##                   0.97028                   0.30464                  -0.08452 
##                     sigma                 Intercept  r_school.id[1,Intercept] 
##                  -3.46903                   1.12001                  -0.89336 
##  r_school.id[2,Intercept]  r_school.id[3,Intercept]  r_school.id[4,Intercept] 
##                  -0.32154                  -0.29856                  -0.84777 
##  r_school.id[5,Intercept]  r_school.id[6,Intercept]  r_school.id[7,Intercept] 
##                  -1.28227                  -0.97646                  -0.89398 
##  r_school.id[8,Intercept]  r_school.id[9,Intercept] r_school.id[10,Intercept] 
##                  -0.76426                  -1.51071                  -1.83574 
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept] 
##                  -1.56588                  -2.12928                  -1.13220 
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept] 
##                  -1.15783                  -0.45452                  -1.60276 
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept] 
##                  -1.29500                  -1.43296                  -0.91722 
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept] 
##                  -1.86383                  -2.11517                  -1.26050 
## r_school.id[23,Intercept]                      lp__                  z_1[1,1] 
##                  -0.57554                   0.26894                  -0.79602 
##                  z_1[1,2]                  z_1[1,3]                  z_1[1,4] 
##                  -0.21136                  -0.07293                  -1.09097 
##                  z_1[1,5]                  z_1[1,6]                  z_1[1,7] 
##                  -1.55298                  -0.82746                  -0.86972 
##                  z_1[1,8]                  z_1[1,9]                 z_1[1,10] 
##                  -0.68979                  -1.77116                  -1.47741 
##                 z_1[1,11]                 z_1[1,12]                 z_1[1,13] 
##                  -1.23322                  -1.85005                  -1.12336 
##                 z_1[1,14]                 z_1[1,15]                 z_1[1,16] 
##                  -1.18124                  -0.68887                  -1.63651 
##                 z_1[1,17]                 z_1[1,18]                 z_1[1,19] 
##                  -0.99997                  -0.81145                  -1.12679 
##                 z_1[1,20]                 z_1[1,21]                 z_1[1,22] 
##                  -1.89786                  -2.14401                  -1.22869 
##                 z_1[1,23] 
##                  -0.88554 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##               b_Intercept                   b_homew   sd_school.id__Intercept 
##                  -1.05941                   0.27995                  -0.15081 
##                     sigma                 Intercept  r_school.id[1,Intercept] 
##                  -0.55221                  -0.77898                   0.78193 
##  r_school.id[2,Intercept]  r_school.id[3,Intercept]  r_school.id[4,Intercept] 
##                   1.29278                  -0.11314                   0.31526 
##  r_school.id[5,Intercept]  r_school.id[6,Intercept]  r_school.id[7,Intercept] 
##                   0.46940                   0.81066                   0.79442 
##  r_school.id[8,Intercept]  r_school.id[9,Intercept] r_school.id[10,Intercept] 
##                   0.74701                   1.86655                   0.79157 
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept] 
##                   0.91057                   2.38021                   0.51232 
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept] 
##                   0.26152                   0.82464                   0.44103 
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept] 
##                  -0.12299                   1.12793                   0.64204 
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept] 
##                   1.03908                   1.16799                   0.67659 
## r_school.id[23,Intercept]                      lp__                  z_1[1,1] 
##                   0.23998                   0.51240                   0.62908 
##                  z_1[1,2]                  z_1[1,3]                  z_1[1,4] 
##                   1.51601                  -0.11982                   0.43118 
##                  z_1[1,5]                  z_1[1,6]                  z_1[1,7] 
##                   0.36501                   0.99540                   1.09792 
##                  z_1[1,8]                  z_1[1,9]                 z_1[1,10] 
##                   0.56618                   1.95949                   0.64570 
##                 z_1[1,11]                 z_1[1,12]                 z_1[1,13] 
##                   0.79653                   2.04416                   0.35262 
##                 z_1[1,14]                 z_1[1,15]                 z_1[1,16] 
##                   0.41718                   0.61698                   0.61594 
##                 z_1[1,17]                 z_1[1,18]                 z_1[1,19] 
##                  -0.03012                   0.78268                   0.59286 
##                 z_1[1,20]                 z_1[1,21]                 z_1[1,22] 
##                   0.89527                   0.94690                   0.66296 
##                 z_1[1,23] 
##                   0.30602 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##               b_Intercept                   b_homew   sd_school.id__Intercept 
##                 -1.259627                  0.236218                 -1.008012 
##                     sigma                 Intercept  r_school.id[1,Intercept] 
##                 -0.211528                 -1.337321                  1.445444 
##  r_school.id[2,Intercept]  r_school.id[3,Intercept]  r_school.id[4,Intercept] 
##                  0.766435                  2.039606                  1.160737 
##  r_school.id[5,Intercept]  r_school.id[6,Intercept]  r_school.id[7,Intercept] 
##                  0.247064                  0.512342                  0.725663 
##  r_school.id[8,Intercept]  r_school.id[9,Intercept] r_school.id[10,Intercept] 
##                  1.387954                  1.439807                 -0.001807 
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept] 
##                  0.923359                  0.905400                  1.321911 
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept] 
##                  0.807240                  1.557152                  0.778109 
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept] 
##                  0.121332                  1.548884                  1.478894 
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept] 
##                  1.551667                  0.411278                  0.266533 
## r_school.id[23,Intercept]                      lp__                  z_1[1,1] 
##                  0.378983                 -0.600226                  1.869893 
##                  z_1[1,2]                  z_1[1,3]                  z_1[1,4] 
##                  1.273884                  1.860742                  1.156809 
##                  z_1[1,5]                  z_1[1,6]                  z_1[1,7] 
##                  0.029910                  0.591427                  0.899471 
##                  z_1[1,8]                  z_1[1,9]                 z_1[1,10] 
##                  0.646359                  1.893778                 -0.252325 
##                 z_1[1,11]                 z_1[1,12]                 z_1[1,13] 
##                  0.246796                  0.963242                  1.243398 
##                 z_1[1,14]                 z_1[1,15]                 z_1[1,16] 
##                  1.026494                  1.636478                  0.694000 
##                 z_1[1,17]                 z_1[1,18]                 z_1[1,19] 
##                 -0.869947                  1.596634                  1.240794 
##                 z_1[1,20]                 z_1[1,21]                 z_1[1,22] 
##                  1.277465                  0.661598                  0.081849 
##                 z_1[1,23] 
##                  1.259285 
## 
## 
## [[4]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##               b_Intercept                   b_homew   sd_school.id__Intercept 
##                   1.14259                  -1.70585                   0.79800 
##                     sigma                 Intercept  r_school.id[1,Intercept] 
##                   1.44473                   0.74201                  -0.53414 
##  r_school.id[2,Intercept]  r_school.id[3,Intercept]  r_school.id[4,Intercept] 
##                  -1.56332                  -0.59641                  -1.16297 
##  r_school.id[5,Intercept]  r_school.id[6,Intercept]  r_school.id[7,Intercept] 
##                  -1.28363                  -0.59963                  -0.05399 
##  r_school.id[8,Intercept]  r_school.id[9,Intercept] r_school.id[10,Intercept] 
##                  -0.76837                  -0.78223                  -1.15491 
## r_school.id[11,Intercept] r_school.id[12,Intercept] r_school.id[13,Intercept] 
##                  -0.98104                   0.28329                  -1.11993 
## r_school.id[14,Intercept] r_school.id[15,Intercept] r_school.id[16,Intercept] 
##                  -0.58757                  -1.42763                   0.11268 
## r_school.id[17,Intercept] r_school.id[18,Intercept] r_school.id[19,Intercept] 
##                  -0.32541                  -0.52723                  -1.37239 
## r_school.id[20,Intercept] r_school.id[21,Intercept] r_school.id[22,Intercept] 
##                  -0.65414                  -0.86849                  -0.58847 
## r_school.id[23,Intercept]                      lp__                  z_1[1,1] 
##                  -1.45465                   0.44143                  -0.69508 
##                  z_1[1,2]                  z_1[1,3]                  z_1[1,4] 
##                  -1.73276                  -0.52637                  -1.30755 
##                  z_1[1,5]                  z_1[1,6]                  z_1[1,7] 
##                  -0.69244                  -0.58976                  -0.27989 
##                  z_1[1,8]                  z_1[1,9]                 z_1[1,10] 
##                   0.24834                  -1.26829                  -0.85903 
##                 z_1[1,11]                 z_1[1,12]                 z_1[1,13] 
##                  -0.38037                   0.01449                  -0.84458 
##                 z_1[1,14]                 z_1[1,15]                 z_1[1,16] 
##                  -0.89134                  -1.74423                   0.28303 
##                 z_1[1,17]                 z_1[1,18]                 z_1[1,19] 
##                   0.86765                  -1.19187                  -1.27908 
##                 z_1[1,20]                 z_1[1,21]                 z_1[1,22] 
##                  -0.19038                  -0.99720                  -0.17851 
##                 z_1[1,23] 
##                  -1.67768

autocorr.plot(modelRI)

Summary table for Random Intercept

summary(modelRI.brm)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: math ~ homew + (1 | school.id) 
##    Data: nels (Number of observations: 519) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~school.id (Number of levels: 23) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     4.89      0.94     3.39     7.02 1.00     1208     1608
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    46.35      1.24    43.93    48.79 1.00     1036     1875
## homew         2.40      0.27     1.86     2.93 1.00     5374     2797
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     8.46      0.27     7.96     9.02 1.00     5955     2846
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

More complex model

And lets make this more complex by adding random slope, and some more fixed effects including a cross level interaction,

# Random intercept & slope with level 1 and level 2 predictors    

nels$public <- ifelse(nels$schtype==1,1,0)
nels$public.fac <- as.factor(nels$public)

via lmer

modelRIS.lmer <- lmer(math~ 1 + homew + ses + public.fac  + homew*public.fac + (1 + homew|school.id),
                    data=nels, REML=TRUE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(modelRIS.lmer)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ 1 + homew + ses + public.fac + homew * public.fac + (1 +  
##     homew | school.id)
##    Data: nels
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
## REML criterion at convergence: 3598.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3958 -0.6789 -0.0247  0.6389  2.9930 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev. Corr 
##  school.id (Intercept) 53.66    7.325         
##            homew       15.49    3.936    -0.87
##  Residual              51.40    7.169         
## Number of obs: 519, groups:  school.id, 23
## 
## Fixed effects:
##                   Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)        47.5428     2.8214  21.5157  16.851 7.07e-14 ***
## homew               2.3253     1.4741  18.6166   1.577    0.132    
## ses                 2.6621     0.5071 496.4469   5.250 2.26e-07 ***
## public.fac1        -0.8167     3.4978  21.7755  -0.233    0.818    
## homew:public.fac1  -0.7524     1.8288  18.8471  -0.411    0.685    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) homew  ses    pblc.1
## homew       -0.856                     
## ses         -0.068  0.000              
## public.fac1 -0.812  0.691  0.135       
## hmw:pblc.f1  0.692 -0.806 -0.026 -0.856

via brms

And now for the Bayesian model,

modelRIS.brm <- brm(math~ 1 + homew + ses + public.fac  + homew*public.fac + (1 + homew|school.id), 
                data=nels, cores=4,  save_all_pars=TRUE)  # fit the model

## Warning: Argument 'save_all_pars' is deprecated. Please use argument 'all' in
## function 'save_pars()' instead.

## Compiling Stan program...

## Start sampling

summary(modelRIS.brm)                  # summary of model information

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: math ~ 1 + homew + ses + public.fac + homew * public.fac + (1 + homew | school.id) 
##    Data: nels (Number of observations: 519) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~school.id (Number of levels: 23) 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)            7.50      1.39     5.31    10.69 1.00     1014
## sd(homew)                4.05      0.77     2.78     5.78 1.01      992
## cor(Intercept,homew)    -0.83      0.09    -0.94    -0.62 1.01     1099
##                      Tail_ESS
## sd(Intercept)            1856
## sd(homew)                1779
## cor(Intercept,homew)     1581
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            47.68      2.88    42.15    53.31 1.00     1123     1798
## homew                 2.25      1.50    -0.81     5.17 1.00     1001     1488
## ses                   2.64      0.52     1.63     3.67 1.00     3540     3163
## public.fac1          -1.02      3.56    -8.10     5.94 1.00     1001     1587
## homew:public.fac1    -0.65      1.88    -4.32     3.19 1.00      986     1492
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     7.19      0.23     6.75     7.66 1.00     3287     2769
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

plot(modelRIS.brm)                     # trace and density plots

bayes_R2(modelRIS.brm)                 # Bayes version of R2

##     Estimate Est.Error      Q2.5    Q97.5
## R2 0.5539493 0.0214397 0.5098393 0.593078

bayes_factor(modelRIS.brm, modelRI.brm)  # Computes bayes factor

## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Iteration: 7
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6

## Estimated Bayes factor in favor of modelRIS.brm over modelRI.brm: 168854354287575016592864822.00000

Comparing Models

Possible criteria for comparing models are (in order worst to best) Bayes Factor, DIC, waic, loo, and psis-loo.

The criterion “waic” (Widely accessible information criterion) is asymptotically equivalent to loo (Leave-one-out-validation), but loo is faster and easier to compute. Some work has shown that loo is the preferred for model comparison. For these data and models, R throws warnings when I ask for “waic” and suggests using loo instead..

loo is noisy and a solution suggested for this is to use Pareto smoothed importance sampling LOO or PSIS-loo (Verhari, Gelman & Gabry, “Practical Bayesian model evaluation using leave-one-out cross-validation, http://www.stat.columbia.edu/~gelman/research/unpublished/loo_stan.pdf)
Since I didn’t get is at first, I can add this. The command below computes PSIS-LOO and compares values for our 3 models.

We can pull these out for each model,

modelRI.brm <- add_criterion(modelRI.brm, criterion="loo")
modelRI.brm$criteria$loo

## 
## Computed from 4000 by 519 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo  -1856.3 15.0
## p_loo        22.8  1.6
## looic      3712.6 30.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

modelRIS.brm <- add_criterion(modelRIS.brm, criterion="loo")

## Warning: Found 1 observations with a pareto_k > 0.7 in model 'modelRIS.brm'. It
## is recommended to set 'moment_match = TRUE' in order to perform moment matching
## for problematic observations.

modelRIS.brm$criteria$loo

## 
## Computed from 4000 by 519 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo  -1780.1 14.6
## p_loo        36.4  2.3
## looic      3560.2 29.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     516   99.4%   652       
##  (0.5, 0.7]   (ok)         2    0.4%   766       
##    (0.7, 1]   (bad)        1    0.2%   163       
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.

But better yet, for a direct comparison use

loo(modelRI.brm, modelRIS.brm, compare=TRUE, moment_match=TRUE, save_psis=TRUE)

## Output of model 'modelRI.brm':
## 
## Computed from 4000 by 519 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo  -1856.3 15.0
## p_loo        22.8  1.6
## looic      3712.6 30.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'modelRIS.brm':
## 
## Computed from 4000 by 519 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo  -1780.1 14.6
## p_loo        36.4  2.3
## looic      3560.2 29.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     516   99.4%   652       
##  (0.5, 0.7]   (ok)         2    0.4%   766       
##    (0.7, 1]   (bad)        1    0.2%   163       
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##              elpd_diff se_diff
## modelRIS.brm   0.0       0.0  
## modelRI.brm  -76.2      11.4

Note that

 elpd_loo is the Expected Log Pointwise Predictive Density, which is the sum of N individual pointwise log predictive densities.

 p_loo is the difference between elpd_loo and the non-cross validity log posterior predictive density.

 elpd_diff is the difference in elpl_loo for two models.  If more than 2 models are compared then it is the difference relative to the model with the highest elpd_loo.

Smaller values of looic are better (just like other information criteria), and for elpd larger is better. Note that looic equals \(-2*elpd\).

For more information a good starting place is https://mc-stan.org/loo/reference/loo-glossary.html

Other model families

There are lots!! We specify them in brms pretty much how we would do these using glm; however, the range of models is greater with brms than with glm. The model families (from the brms documentation 8/29/2019) include

student(link = "identity", link_sigma = "log", link_nu = "logm1")

```
bernoulli(link = "logit")
```

negbinomial(link = "log", link_shape = "log")

```
geometric(link = "log")
```

lognormal(link = "identity", link_sigma = "log")

shifted_lognormal(link = "identity", link_sigma = "log",

link_ndt = "log")skew_normal(link = "identity", link_sigma = "log",link_alpha = "identity")

exponential(link = "log")weibull(link = "log", link_shape = "log")

frechet(link = "log", link_nu = "logm1")

gen_extreme_value(link = "identity", link_sigma = "log",link_xi = "log1p")

exgaussian(link = "identity", link_sigma = "log", link_beta = "log")

wiener(link = "identity", link_bs = "log", link_ndt = "log",link_bias = "logit")

```
Beta(link = "logit", link_phi = "log")
```

dirichlet(link = "logit", link_phi = "log", refcat = NULL)

von_mises(link = "tan_half", link_kappa = "log")

asym_laplace(link = "identity", link_sigma = "log",link_quantile = "logit")

```
hurdle_poisson(link = "log")
```

hurdle_negbinomial(link = "log", link_shape = "log",link_hu = "logit")

hurdle_gamma(link = "log", link_shape = "log", link_hu = "logit")

hurdle_lognormal(link = "identity", link_sigma = "log",link_hu = "logit")

zero_inflated_beta(link = "logit", link_phi = "log",link_zi = "logit")

zero_one_inflated_beta(link = "logit", link_phi = "log",link_zoi = "logit", link_coi = "logit")

zero_inflated_poisson(link = "log", link_zi = "logit")

zero_inflated_negbinomial(link = "log", link_shape = "log",link_zi = "logit")

zero_inflated_binomial(link = "logit", link_zi = "logit")

categorical(link = "logit", refcat = NULL)

multinomial(link = "logit", refcat = NULL)

cumulative(link = "logit", link_disc = "log",threshold = c("flexible", "equidistant"))

sratio(link = "logit", link_disc = "log", threshold = c("flexible","equidistant"))

cratio(link = "logit", link_disc = "log", threshold = c("flexible","equidistant"))

acat(link = "logit", link_disc = "log", threshold = c("flexible","equidistant"))

587 Workup of Bayesian Example

Carolyn J. Anderson

1/4/2023