In this session, we’ll use rjags, runjags and jagsUI to estimate the mean and variance of a weight change of girls being treated for Anorexia. Before you can run this you need to

  1. Install the program jags
  2. Install the r packages:  coda, rjags, runjags, and jagsUI.

There is at least one other r package that is a wrapper for jags, but these are more than enough (and the ones illustrated below)

Set-up

Below are the four packages that we’ll be using. The coda has fucntions that compute/create various diagnostics. The multiple jags packages have slightly different input and output, but all are wrappers for the program jags.

library(coda)
library(rjags)
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
library(runjags)
library(jagsUI)
## 
## Attaching package: 'jagsUI'
## The following object is masked from 'package:coda':
## 
##     traceplot

As always, our first step is to set your working directory, read in the data, create a change in weight score, and take a look at the data:

setwd("C:/Users/cja/Dropbox/edps 590BAY/Lectures/5 Gibbs Sampling")
ano <- read.table("anorexia_data.txt",header=T)
ano$change <- ano$weight2 - ano$weight1
head(ano)
##   rx girl weight1 weight2 change
## 1  1    1    80.5    82.2    1.7
## 2  1    2    84.9    85.6    0.7
## 3  1    3    81.5    81.4   -0.1
## 4  1    4    82.6    81.9   -0.7
## 5  1    5    79.9    76.4   -3.5
## 6  1    6    88.7   103.6   14.9

I would like to get the same results each time I run this code, so I’ll set a seed for the random number generaters.

set.seed(234590)

Create data list

The data list will be the same for rjags, runjags and jagsUI. We list and define variables that will be used to create our data model (i.e., the likelihood). For our problem this is

dataList <- list(y=ano$change, 
                 Ntotal=length(ano$change),
                 meanY = mean(ano$change),
                 sdY = sd(ano$change)
                 )

The dataLlist object now contains the following

dataList
## $y
##  [1]   1.7   0.7  -0.1  -0.7  -3.5  14.9   3.5  17.1  -7.6   1.6  11.7   6.1
## [13]   1.1  -4.0  20.9  -9.1   2.1  -1.4   1.4  -0.3  -3.7  -0.8   2.4  12.6
## [25]   1.9   3.9   0.1  15.4  -0.7  -0.5  -9.3  -5.4  12.3  -2.0 -10.2 -12.2
## [37]  11.6  -7.1   6.2  -0.2  -9.2   8.3   3.3  11.3   0.0  -1.0 -10.6  -4.6
## [49]  -6.7   2.8   0.3   1.8   3.7  15.9 -10.2  11.4  11.0   5.5   9.4  13.6
## [61]  -2.9  -0.1   7.4  21.5  -5.3  -3.8  13.4  13.1   9.0   3.9   5.7  10.7
## 
## $Ntotal
## [1] 72
## 
## $meanY
## [1] 2.763889
## 
## $sdY
## [1] 7.983598

Create the data model

The model will define the likelihood and also give the priors for the parameters being estimated. Note that jags “dnorm” is different from base “dnorm”. For jags, the parameters for jags “dnorm” are the mean and preicsion, which is 1/variance of the data.

Model1 = "model {                                    
    for (i in 1:Ntotal){
       y[i] ~ dnorm( mu, precision )                
        }
    mu ~ dnorm( meanY , 1/(100*sdY^2) )  # mean (y) and precision N(2.763889,1.568927e-06)
    precision <- pow(sigma, -2)         # tau is presicion = 1/var(y)
    sigma ~ dunif( sdY/1000, sdY*1000 )  # variance (y) and uniform(.00798,7983.598)        
   }
   "

Note the quotes around this code. We will save the model to a text file that will be called by jags. The following command writes the model to a file that I’m calling “Model1.txt”. You can find this on your hard drive in the current working directory that you set.

writeLines(Model1, con="Model1.txt")

Go to your working memory and look for “Model1.txt” and take a look at it.

Set Initial values (rjags)

Regardless of which implemenations you use for jags, they all require that intital values are a list object. In this case, I simpley used the same mean and variance.

thetaInit = mean(ano$change)
sigmaInit = sd(ano$change)
initsList = list(mu=thetaInit,sigma=sigmaInit)

Run rjags

jagsModel1 <- jags.model(file="Model1.txt",     # compiles and intializes model
                        data=dataList,
                        inits=initsList,
                        n.chains=4,
                        n.adapt=500)            # adaptive phase for 500 iterations
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 72
##    Unobserved stochastic nodes: 2
##    Total graph size: 88
## 
## Initializing model

Model model has successfully complieds; that is, it has be initialized and we can do some sampling.

update (jagsModel1, n.iter=500)                 # gets samples from posterior with a 
                                                #  burn in of 500 iterations

And now for getting the samples for many iterations.

# contains samples from all chains with 500 iterations
Samples <- coda.samples(jagsModel1, variable.names=c("mu"," precision","sigma"), n.iter=4000)

Did our model converge?

There are many diagnostics that we can look at including densities, trace plots, geweke statistics, Rhat/psrf/Gelman statistics, auto-correlations, effective sample sizes and more.

We will start with trace plots and densities for each of our parameters that we’ve estimated. Note that these look better (nicer formatting) when I run them in R.

plot(Samples)

Although we can look at the trace plots for stationary and nice mixing of the chains, to back what we think we see we can get the Gelman (proportional scale reduction factors or Rhats). These should be less than .01.

gelman.diag(Samples)
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## mu                 1          1
## precision          1          1
## sigma              1          1
## 
## Multivariate psrf
## 
## 1

Can’t get much better than these numbers. We can also get a plot of them

gelman.plot(Samples)

Below are dome other diagnostics as can look at and these done for each chain for each parameter.

autocorr.plot(Samples,auto.layout=TRUE)

geweke.diag(Samples,frac1=0.1,frac2=0.5)
## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        mu precision     sigma 
##   -1.5628   -0.4085    0.2223 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        mu precision     sigma 
##    1.3552    0.2853   -0.3187 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        mu precision     sigma 
##    1.1674   -0.9107    0.8325 
## 
## 
## [[4]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        mu precision     sigma 
##   -1.1700    0.5618   -0.6945
cumuplot(Samples,probs=c(.25,.50,.75),lwd=c(1,2),lty=c(2,1),col=c("blue","red"))

Lastly, we might want to know what the effective sample size is for each of our parameters in the posterior distribution.

effectiveSize(Samples)
##        mu precision     sigma 
## 16000.000  9737.871  9294.804

Recall that the effective sample size is the number of independent Monte Carlo samples needed to give the same precision as the MCMC samples.

Tables of Summary Statistics from our Posterior

Dropping the burn-ins or warm-ups, we can compute various sample statistics that describe our posterior distributions. Note that these are computed over all chains.

summary(Samples)
## 
## Iterations = 1001:5000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 4000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##              Mean       SD  Naive SE Time-series SE
## mu        2.76463 0.967268 7.647e-03      7.647e-03
## precision 0.01551 0.002615 2.067e-05      2.661e-05
## sigma     8.11561 0.695074 5.495e-03      7.244e-03
## 
## 2. Quantiles for each variable:
## 
##              2.5%     25%     50%     75%   97.5%
## mu        0.90242 2.11192 2.76211 3.42252 4.66962
## precision 0.01079 0.01369 0.01538 0.01717 0.02103
## sigma     6.89594 7.63087 8.06284 8.54797 9.62696

Working With Samples:

If we put our posterior sample into a more useable format, we can compute any statistic that we want to need. This includes what is in the summary table as well as anything else that you might want to know.

The main reason for doing this is to show you how the three measures of variability are computed. For these I will give formulas but also show what they are with “Samples”; that is, how they are obtained.

Converting “Samples” into Data Frame

Let’s first looks are the object Samples

class(Samples)
## [1] "mcmc.list"
#-- What's in First chain
dim(Samples[[1]])
## [1] 4000    3
class(Samples[[1]])
## [1] "mcmc"
head(Samples[[1]])
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1001 
## End = 1007 
## Thinning interval = 1 
##            mu  precision    sigma
## [1,] 2.278041 0.01690015 7.692273
## [2,] 2.096900 0.01664735 7.750460
## [3,] 1.383834 0.01499367 8.166688
## [4,] 2.707556 0.01324180 8.690134
## [5,] 1.191909 0.01913108 7.229867
## [6,] 4.327807 0.01507461 8.144734
## [7,] 1.761290 0.01622409 7.850907

We need to vertically concatenate the samples from each chain into a single matrix or array. The following will do this

post_samples <- as.data.frame(rbind(Samples[[1]], Samples[[2]],
                      Samples[[3]], Samples[[4]]))

class(post_samples)
## [1] "data.frame"
dim(post_samples)
## [1] 16000     3
head(post_samples)
##         mu  precision    sigma
## 1 2.278041 0.01690015 7.692273
## 2 2.096900 0.01664735 7.750460
## 3 1.383834 0.01499367 8.166688
## 4 2.707556 0.01324180 8.690134
## 5 1.191909 0.01913108 7.229867
## 6 4.327807 0.01507461 8.144734

Sample statistics of posterior

Mean

Compare the following with what is in summary table

(Ntotal <- dim(post_samples)[1])   #--- number iterations X number chains
## [1] 16000
(Nchains <- 4)
## [1] 4
(Niter.per.chain <- 4000)          
## [1] 4000
#--- These are means are same as those above
(mu.mean <- colMeans(post_samples)) 
##         mu  precision      sigma 
## 2.76462974 0.01551301 8.11560952

Variability Measures

SD

The column label “SD” in the summary tables for “Samples” is usualy way we would compute the standard deviation of a sample of data: \[ SD = \sum_{i=1}^I (\hat{\theta}_i - \bar{\hat{\theta}})^2 / N, \] where the summation is over all iterations and chains.

The R to get SDs,

(sd.samples <- sapply(post_samples, sd))
##         mu  precision      sigma 
## 0.96726827 0.00261462 0.69507380

We can also compute the variance, which we’ll be using in the next measures

# --- for below 
(var.samples <- sapply(post_samples, var))
##           mu    precision        sigma 
## 9.356079e-01 6.836235e-06 4.831276e-01

SE of MCMC Estimates (i.e. MCerr)

This is given in the run.jags output. We might want a measure of the precision (or variance) of the MCMC samples. This is the SE of MCMC estimates but uses a different sample size; namely, \[ \mbox{var}_{mcmc(\theta)} = \frac{\mbox{var}(\theta)}{\mbox{effective sample size}} \]

and in R

(eff.size <- effectiveSize(Samples))
##        mu precision     sigma 
## 16000.000  9737.871  9294.804
(var.mcmc <- var.samples/eff.size)
##           mu    precision        sigma 
## 5.847549e-05 7.020257e-10 5.197825e-05
(MCerr <- sqrt(var.mcmc))
##           mu    precision        sigma 
## 7.646927e-03 2.649577e-05 7.209594e-03

Naive SE

The next two sample statistics is based on information in https://stats.stackexchange.com/questions/74450/naive-se-vs-time-series-se-which-statistics-should-i-report-after-bayesian-esti

The MCerr (above), the naive se, and the times series are measures of possible computational errors of the MCMC’s posterior expected values.

The naive SE are standard errors of expected values similar to the standard deviation of (hypothetical) sampling distributions in frequentest frame work (but ours are based on data).

The naive standard error (se.naive) ignores that possible auto-correlation (dependency) of sample estimates; therefore, it is not a good estimator of the of variability of the expected values (i.e., the “Means”). The naive SE is even worse than the MCerr. The MCerr takes into account the effective sample size, which in turn is related to auto-correlation. If the auto-correlation is large, then effective sample size is small and naieve SE is too small.

The naive values are computed as \[ \mbox{se.naive } = \sqrt{\frac{\mbox{var}(\theta)}{N}},\] were \(N\) equals the number of chains times iterations.

(se.naive <- sqrt(var.samples/Ntotal))
##           mu    precision        sigma 
## 7.646927e-03 2.067038e-05 5.495041e-03

Note that these are smaller for precisio and sigma, but are the same for mu (the effective sample size for mu is 16,000 — the sames as \(N\)).

Time-series SE

The time-series estimate corrects for possible auto-correlation. An auto-correlation is for specific “lags is the correlation between successive values.

An auto-correlation is the correlation between a set of numbers (ordered over time or space) and same set but shifted. For example, the correlation between the first two columns below is a lag 1 correlation, the correlation between the first and third column is a lag 2 correlation, and so on.

\[ \begin{array}{rrr} \theta_1 & -- & -- \\ \theta_2 & \theta_1 & -- \\ \theta_3 & \theta_2 & \theta_1 \\ \theta_4 & \theta_3 & \theta_2 \\ \vdots & & \\ \theta_s & \theta_{2-1} & \theta_{s-2} \\ \end{array} \]

The time-series SE involves fitting auto-correlation models. The definition of time-series SE is \[ \mbox{SE}_{ts} = \sqrt{\frac{\mbox{var}_{ts}(\theta)}{\mbox{Ntotal}}}, \] where \(\mbox{var}_{ts}(\theta)\) is the mean of \(\mbox{var}_{ts}(\theta^(c))^(c))\) which are obtained by fitting an auto-regressive models to each chain c. The auto-regressive model that is fit is \[ \theta_t^(c) - \mu^(c) = \sum_{k=1}^K \rho_k(\theta_k^(c) - \mu^(c)) +\epsilon,\] where \(\epsilon \sim (N(0,\sigma^2_{\epsilon})\), K is the order or largest lag needed, \(\theta_t^{(c)}\) is the sampled value from posterior for chain c. Using this we can compute \[ \mbox{var}_{ts}(\theta^{(c)})^{(c)}= \frac{\sigma^2_{\epsilon}}{(1- \sum_{k=1}^{K} \rho_k)^2 }\]

Since this has to be done repeatably (i.e., for each chain and each parameter), I wrote function. This is was my way to make sure that this gives the same results as in the summary table.

#########################################
# function that computes time-series SE #
#########################################
# parameter = 1 first parameter
#             2 for 2nd parameter
#                   ...
#                   P for Pth parameter 
#
# Samples is an mcmc.list 
#

se.ts <- function(Samples, parameter) {
    
    Nchains <- length(Samples)
    Ntotal <- nrow(Samples[[1]])*Nchains
    
  var.tsx <- matrix(NA, nrow=Nchains, ncol=1)

  for (c in 1:Nchains) {
      chain.c <- Samples[[c]]
      fit.ar <- ar(chain.c[ ,parameter], order.max=10, aic=TRUE)
      var.tsx[c,1] <- fit.ar$var.pred/((1-sum(fit.ar$ar))**2)
  }
    var.tsc<- colMeans(var.tsx)
    var.ts <- var.tsc/Ntotal
    sd.ts <- sqrt(var.ts)
    return(sd.ts)
}

For mu, precision and sigma

se.ts(Samples, 1)
## [1] 0.007647348
se.ts(Samples, 2)
## [1] 2.737447e-05
se.ts(Samples, 3)
## [1] 0.00745088

runjags run!

Different front ends onto jags are a bit different in terms of default output and input. All these use the same datalist and model but differ in terms of initial or starting values and getting the samples. First we’ll look at runjag and then jagsUI. The latter is overly verbose.

So to we set up out starting values. I’ll use the sample mean and standard deviation of change values for one of the chains and for the others use different means and variances. For the other 3 chains, I sampled from normal distributions with different means and standard deviations.

(meanY = mean(ano$change))
## [1] 2.763889
(sdY = sd(ano$change))
## [1] 7.983598
initsList = list(list("mu"=meanY, "sigma"=sdY),
                 list("mu"=rnorm(1,2,4), "sigma"=1 ),
                 list("mu"=rnorm(1,4,1), "sigma"=8 ),
                 list("mu"=rnorm(1,-4,2),"sigma"=.5 )
                 )

Go ahead and take a look at this.

The next comand gets the samples (i.e., runs jags)

out.runjags <- run.jags(model=Model1, 
                        monitor=c("mu","sigma","precision","dic"),
                        data=dataList, 
                        n.chains=4, 
                        inits=initsList)
## module dic loaded
## Compiling rjags model...
## Calling the simulation using the rjags method...
## Adapting the model for 1000 iterations...
## Burning in the model for 4000 iterations...
## Running the model for 10000 iterations...
## Simulation complete
## Calculating summary statistics...
## Calculating the Gelman-Rubin statistic for 3 variables....
## Finished running the simulation

runjag will easly product diagnostic plots, for example

plot(out.runjags)
## Generating plots...

You can also use coda to produce the same graphics as I did with rjags.

The table of summary statistics from runjag includes more than rjags gives.

print(out.runjags)
## 
## JAGS model summary statistics from 40000 samples (chains = 4; adapt+burnin = 5000):
##                                                                         
##            Lower95   Median  Upper95     Mean        SD Mode       MCerr
## mu          0.9158   2.7661   4.6798   2.7616   0.95767   --   0.0047884
## sigma       6.7853   8.0784   9.5035   8.1244   0.70085   --   0.0046003
## precision 0.010405 0.015323 0.020622 0.015484 0.0026251   -- 0.000016632
##                                           
##           MC%ofSD SSeff      AC.10    psrf
## mu            0.5 40000 -0.0061651 0.99997
## sigma         0.7 23210  0.0023248       1
## precision     0.6 24910  0.0022227  1.0001
## 
## Model fit assessment:
## DIC = 506.5989
## [PED not available from the stored object]
## Estimated effective number of parameters:  pD = 2.07577
## 
## Total time taken: 3.8 seconds

jagsUI

If you only use jags once in a while and don’t remember what all this various statistics are, the jagsUI is for you. We use the same data list, model, and intial values list as we did for runjags, but a little different command to get the samples.

out.jagsUI <- jags(model.file="Model1.txt",
              data=dataList,inits=initsList, 
              parameters.to.save=c("mu","sigma","precision"), 
              n.iter=2000,
              n.burnin=500,
              n.chains=4)
## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 72
##    Unobserved stochastic nodes: 2
##    Total graph size: 88
## 
## Initializing model
## 
## Adaptive phase..... 
## Adaptive phase complete 
##  
## 
##  Burn-in phase, 500 iterations x 4 chains 
##  
## 
## Sampling from joint posterior, 1500 iterations x 4 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

and the verbose table of summary statistics

print(out.jagsUI)
## JAGS output for model 'Model1.txt', generated by jagsUI.
## Estimates based on 4 chains of 2000 iterations,
## adaptation = 100 iterations (sufficient),
## burn-in = 500 iterations and thin rate = 1,
## yielding 6000 total samples from the joint posterior. 
## MCMC ran for 0.006 minutes at time 2022-09-26 16:40:54.
## 
##              mean    sd    2.5%     50%   97.5% overlap0     f  Rhat n.eff
## mu          2.748 0.957   0.881   2.744   4.592    FALSE 0.997 1.000  5003
## sigma       8.134 0.688   6.920   8.087   9.579    FALSE 1.000 1.000  6000
## precision   0.015 0.003   0.011   0.015   0.021    FALSE 1.000 1.000  6000
## deviance  504.499 1.995 502.517 503.895 509.878    FALSE 1.000 1.002  4299
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 2 and DIC = 506.489 
## DIC is an estimate of expected predictive error (lower is better).