This document contains R code for Bayesian (simple) linear regression using jags. We will look at just one school from the NELS data (later we extends this to more predictors and lots of schools and do mixed effects regression (i.e., multilevel regression or HLM)

Setup

The libraries that I used are

library(coda)
library(rjags)
library(runjags)

set.seed(7115)

I set my working directory where the data live, where model file will be wrtten, and graphs that I save will be located at

setwd("D:/Dropbox/edps 590BAY/Lectures/6 Linear Regression")

Now to read the data and look to make sure it is what I’m expecting.

nels <- read.table("nels_school_62821.txt",header=TRUE)
head(nels)
##   schid studid sex race homework schtyp  ses paredu math clastruc size urban
## 1 62821      0   1    4        4      4 1.63      6   62        3    3     1
## 2 62821      1   2    4        5      4 0.48      4   68        3    3     1
## 3 62821      3   1    4        5      4 0.69      5   56        3    3     1
## 4 62821      4   2    4        5      4 0.60      4   68        3    3     1
## 5 62821      6   2    4        2      4 0.77      4   61        3    3     1
## 6 62821      7   2    4        2      4 1.19      5   61        3    3     1
##   geograph perminor rati0
## 1        2        3    10
## 2        2        3    10
## 3        2        3    10
## 4        2        3    10
## 5        2        3    10
## 6        2        3    10

And a graph of data that we’ll model

# Plot of data
plot(nels$homework,nels$math,type='p',
     main="One School from NELS",
     xlab="Time Spent Doing Homework",
     ylab="Math Scores",
     cex=1.5,
     pch=1)

Simple Linear Regression via lm

schMathHmwk <- aggregate(nels$math, list(nels$homework), mean)

I like to add things to my figure. Below is code to add means for each value of the predictor (time spent doing homework) and a regression line.

# Plot of data
plot(nels$homework,nels$math,type='p',
     main="One School from NELS",
     xlab="Time Spent Doing Homework",
     ylab="Math Scores",
     cex=1.5,
     pch=1)# Adds means
lines(schMathHmwk$Group.1,schMathHmwk$x,type='p',pch=19,col="blue")
# Add linear regression line
ols.lm <- lm(math~homework,data=nels)
summary(ols.lm)
coef(ols.lm)
abline(ols.lm,col="blue",lwd=2)
# Adds model and makes it big enought to read
text(3,45,"y=59.21 + 1.09(Homework)",col="blue",cex=1.25)

rjag For Linear Regression

Step 1:

Create data list. Note that you should use “=” and not “<-”. Once this is set we do not need to change it (unless we change what we include as outcome or predictors/explantory variables)

dataList <- list(y=nels$math,
                 x=nels$homework,
                 N=length(nels$math),
                 sdY = sd(nels$math)
                 )

Take a quick look at this.

Step 2:

We will start with a simple linear regression model where the data are normaly distributed and rather uniformative priors.

nelsLR1 = "model {                                    
                 for (i in 1:N){
                  y[i] ~ dnorm(mu[i] , precision)
                  mu[i] <- b0 + b1*x[i]         
                 }
                 b0 ~ dnorm(0 , 1/(100*sdY^2) )   
                 b1 ~ dnorm(0 , 1/(100*sdY^2) )   
                 precision ~ dgamma(.01, .01)
               } 
"
writeLines(nelsLR1, con="nelsLR1.txt")

The order of things here does not matter; that is, I started with defining the likelihood/data model and then specified priors. We could have also started with the priors and then defined the likelihood.

One thing different from base R is that when using jags, dnorm want precision and not standard deviation. So 1/(100*sd^2) is a very small number (not very prescise estimates for b0 and b1 to start), which means a very dispersed piror.
Look for and take a look at nelsLR1.

Step 3:

Create a list object containing intitial or starting values for the sampler.

b0Init = mean(nels$math)
b1Init = 0
precInit = 1/sd(nels$math)
initsList = list(b0=b0Init, b1=b1Init, precision=precInit )

Step 4:

Compile model and get started. I sometimes put the number of iterations very small here just to check my code from Step 2 (save a little time)

jagsNelsLR1 <- jags.model(file="nelsLR1.txt", # compiles and intializes model
                        data=dataList,
                        inits=initsList,
                        n.chains=4,
                        n.adapt=500)    # iterations adaption phase
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 67
##    Unobserved stochastic nodes: 3
##    Total graph size: 163
## 
## Initializing model

And now start sampling

# gets samples from all chains for 4000 iterations
Samples1 <- coda.samples(jagsNelsLR1, variable.names=c("b0","b1","precision"), n.iter=15000)

Step 5

Check whether algorithm converged. Below are things to help with this following

# To see trace and densities for each parameter
plot(Samples1)

# Gelman statistics (Rhat or psrf)
gelman.diag(Samples1)
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## b0                 1          1
## b1                 1          1
## precision          1          1
## 
## Multivariate psrf
## 
## 1
# Plot of Gelman statistics (Rhat or psrf)
gelman.plot(Samples1)

# Plot of auto-correlations (each parameter for each chain)
autocorr.plot(Samples1,auto.layout=TRUE)

# Geweke diagnostics (each parameter for each chain)
geweke.diag(Samples1,frac1=0.1,frac2=0.5)
## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        b0        b1 precision 
##   -0.2962    0.1774   -3.3535 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        b0        b1 precision 
##   -0.5155    0.4974   -0.8912 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        b0        b1 precision 
##    0.6053   -0.5543    1.8327 
## 
## 
## [[4]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##        b0        b1 precision 
##    1.1496   -1.2161    0.6661
# Effective sample sizes
effectiveSize(Samples1)
##        b0        b1 precision 
##   7344.60   7456.17  56202.10
# cumuplot (one graph per parameter per chain)
cumuplot(Samples1,probs=c(.25,.50,.75),lwd=c(1,2),lty=c(2,1),col=c("blue","red"))

# High density intervals for each chain
HPDinterval(Samples1)
## [[1]]
##                 lower      upper
## b0        56.23702554 61.8715849
## b1         0.32496312  1.8425087
## precision  0.02341253  0.0467832
## attr(,"Probability")
## [1] 0.95
## 
## [[2]]
##                 lower       upper
## b0        56.40478631 61.98872628
## b1         0.34998211  1.85793494
## precision  0.02322259  0.04631128
## attr(,"Probability")
## [1] 0.95
## 
## [[3]]
##                 lower       upper
## b0        56.34747789 61.96668456
## b1         0.35411500  1.87187650
## precision  0.02338294  0.04654152
## attr(,"Probability")
## [1] 0.95
## 
## [[4]]
##                 lower       upper
## b0        56.29345294 62.06497919
## b1         0.33249166  1.90299525
## precision  0.02321657  0.04650239
## attr(,"Probability")
## [1] 0.95

Step 6

If it appears the model has converged, then look at summaries of the posterior distribution.

# output summary information
summary(Samples1)
## 
## Iterations = 1:15000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 15000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##               Mean       SD  Naive SE Time-series SE
## b0        59.18476 1.451285 0.0059248      1.699e-02
## b1         1.10005 0.391160 0.0015969      4.554e-03
## precision  0.03437 0.006002 0.0000245      2.532e-05
## 
## 2. Quantiles for each variable:
## 
##               2.5%      25%      50%      75%    97.5%
## b0        56.35435 58.20711 59.18826 60.15366 62.02040
## b1         0.33265  0.83815  1.10027  1.36088  1.86341
## precision  0.02362  0.03014  0.03403  0.03822  0.04695

If you prefer runjags

You need to reset your inital values and change code that gets the sample.

Starting values are needed for each chain. For the first chain I put in smart starting values, but for the other I drew values for b0 from normal distributions with different means and standard deviations.

initsList = list(list("b0"=mean(nels$math), "b1"=sd(nels$math), "precision"=1/sd(nels$math)**2), 
                 list("b0"=rnorm(1,-2,4), "b1"=rnorm(1,0,4)  ,"precision"=1/2),
                 list("b0"=rnorm(1,0,1),  "b1"=rnorm(1,6,4)  ,"precision"=1/5),
                 list("b0"=rnorm(1,2,2),  "b1"=rnorm(1,-2,4) ,"precision"=1/1515 )   
                 )

To run runjags

out.runjags <- run.jags(model=nelsLR1, monitor=c("b0","b1","precision","dic"),
                    data=dataList, n.chains=4, inits=initsList)
## module dic loaded
## Compiling rjags model...
## Calling the simulation using the rjags method...
## Note: the model did not require adaptation
## Burning in the model for 4000 iterations...
## Running the model for 10000 iterations...
## Simulation complete
## Calculating summary statistics...
## Calculating the Gelman-Rubin statistic for 3 variables....
## Finished running the simulation

and some quick figure for diagnostic purposes

plot(out.runjags)
## Generating plots...

If OK, then examine posterior statistics along with more model convergence checks.

print(out.runjags)
## 
## JAGS model summary statistics from 40000 samples (chains = 4; adapt+burnin = 5000):
##                                                                                
##           Lower95   Median  Upper95     Mean        SD Mode       MCerr MC%ofSD
## b0         56.373    59.17   62.082   59.171    1.4609   --    0.021228     1.5
## b1        0.33403   1.1052   1.8755    1.105   0.39305   --   0.0057107     1.5
## precision  0.0232 0.034024 0.046619 0.034407 0.0060463   -- 0.000031221     0.5
##                                 
##           SSeff     AC.10   psrf
## b0         4736  0.098116 1.0005
## b1         4737  0.099765 1.0005
## precision 37504 -0.001269      1
## 
## Model fit assessment:
## DIC = 420.1005
## [PED not available from the stored object]
## Estimated effective number of parameters:  pD = 3.10062
## 
## Total time taken: 1.2 seconds

Dealing With Missing Values

There may be missing data for either the outcome/response variable and/or covariates (have to at least have some data for each case. Since \(y_i\) is already simulated, this doesn’t present a problem. For the covariates/predictor we need to make them stochastic also. Before doing this, we will delete some values from our data set.

# Delete two from each
N <- nrow(nels)

# I renamed the variables because I didn't want to delete them from 
# the nels data set.
y <-nels$math
y.miss = sample(N)[1:2]
y[y.miss] <- NA
y
##  [1] 62 68 56 68 61 61 62 60 68 63 71 61 60 69 70 66 63 63 67 62 51 64 57 67 57
## [26] 69 54 69 63 43 60 71 NA 56 65 69 61 70 56 61 60 65 64 65 70 67 63 63 64 56
## [51] 63 67 54 55 67 NA 57 63 67 70 69 68 52 63 55 69 68
x <-nels$homework
x.miss = sample(N)[1:2]
x[x.miss] <- NA
x
##  [1]  4  5  5  5  2  2  3  1  2  0  5  1  2  5  6  1  2  1  2  5  4  4  1  4  1
## [26] NA  1  5  5  1  5  4  0  2  5  4  2  5  1  3  4  3  4  7  5  1  1  3  4  3
## [51]  3  1  4  5  3  1  5  4  6  3  5  5  5  5 NA  5  4

I will use rjags, but could use others.

Step 1:

I am now using the y and x which have some missing values

dataList <- list(y=y,
                 x=x,
                 N=length(nels$math),
                 sdY = sd(nels$math)
                 )

Step 2:

This is where there are the most changes. I added a model for x (a very simple one) and set the priors for the mean of the missing x. We already have a model and priors for y.

nelsMiss = "model {                                    
                 for (i in 1:N){
                  y[i] ~ dnorm(mu[i] , precision)
                  mu[i] <- b0 + b1*x[i]   
                  
                # --- Add model for x  ----  
                  x[i] ~ dnorm(mu.x, prec.x)                
                 }
                 b0 ~ dnorm(0 , 1/(100*sdY^2) )   
                 b1 ~ dnorm(0 , 1/(100*sdY^2) )   
                 precision ~ dgamma(.01, .01)
                 sigma2 <- 1/precision
                 
                 # --- Need priors for x ---
                 mu.x ~ dnorm(0, 1/(100*sdY^2) ) 
                 prec.x ~ dgamma(.01, .01)               
               } 
"
writeLines(nelsMiss, con="nelsMiss.txt")

Step 3:

Need to add ones for mu.x and prec.x

initsList = list("b0"=mean(nels$math), "b1"=sd(nels$math), "precision"=.1,
                 "mu.x"=mean(nels$math), "prec.x"=.2)

Step 4

Run rjags —

NelsMiss <- jags.model(file="nelsMiss.txt",
           data=dataList,
           inits=initsList,
           n.chains=4,
           n.adapt=500)    # iterations adaption phase
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 130
##    Unobserved stochastic nodes: 9
##    Total graph size: 170
## 
## Initializing model

Sample from the posterior

Note that in addition to getting samples for the means and precisions, we also ask for x and y.

SamplesMiss <- coda.samples(NelsMiss, variable.names=c("b0","b1","precision","mu.x","prec.x","y","x"), n.iter=4000)

Some quick figures for diagnostics

I used the order of the parameters or ‘variable.names’. The first 5 are model parameters and after that we have the x’s and y’s. I added a couple to illustrate.

plot(SamplesMiss[[1]][,1:7])

# output summary information
summary(SamplesMiss)$statistics[1:7,]
##                  Mean          SD     Naive SE Time-series SE
## b0        59.15941135 1.606297450 1.269890e-02   3.849345e-02
## b1         1.09691617 0.420342134 3.323096e-03   1.016903e-02
## mu.x       3.30503624 0.217211048 1.717204e-03   1.699540e-03
## prec.x     0.33025249 0.058375586 4.614995e-04   4.835959e-04
## precision  0.03337832 0.005942718 4.698131e-05   4.994937e-05
## x[1]       4.00000000 0.000000000 0.000000e+00   0.000000e+00
## x[2]       5.00000000 0.000000000 0.000000e+00   0.000000e+00

See how JAGS fills in missing data

first.miss.x = paste0("x[",x.miss[1],"]")
first.miss.y = paste0("x[",y.miss[1],"]")

# --- Scatter plot showing the "imputation"
#     where the Error bars represent posterior standard deviations
par(mfrow=c(1,1))
plot(x, y, 
     ylim = range(y, na.rm = TRUE), 
     xlim = range(x, na.rm = TRUE), 
     pch = 19,
     main="scatterplot showing imputation \n +/- one posterior sd"
     )
     
for(i in x.miss){
  lab = paste0("x[", i, "]")
  s = SamplesMiss[[1]][,lab]
  mn = mean(s)
  sdv = sd(s)
  points(mn, y[i], pch = 19, col = "blue")
  segments(mn - sdv, y[i], mn + sdv, y[i], lwd=2, col="blue")
}

for(i in y.miss){
  lab = paste0("y[",i,"]")
  s = SamplesMiss[[1]][,lab]
  mn = mean(s)
  sdv = sd(s)
  points(x[i], mn,  pch = 19, col = "red")
  segments(x[i], mn - sdv, x[i], mn + sdv, lwd=2, col="red")
}

The values of the missing observations were

nels$homework[x.miss]
## [1] 2 4
nels$math[y.miss]
## [1] 58 63

Robust Regression

Rather than using the normal distribution for our outcome variable, we can use a t-distribution which has heavier tails. Before starting, I want to make sure I have the right data.

dataList <- list(y=nels$math,
                 x=nels$homework,
                 N=length(nels$math),
                 sdY = sd(nels$math)
                 )

To use Student’s t-distribution, we swap out normal likelihood and specify the t in the model definition. For the t-distribution we need degrees of freedom and this is a parameter (“nu”) that we’ll estimate and it needs a prior distribution.

tMod1 = "model {for (i in 1:N){
                  y[i] ~ dt(mu[i] , precision,nu) 
                  mu[i] <- b0 + b1*x[i]
                 }
                 b0 ~ dnorm(0 , 1/(100*sdY^2) )   
                 b1 ~ dnorm(0 , 1/(100*sdY^2) )   
                 precision ~ dgamma(.01, .01)
                 sigma2 <- 1/precision
                 nuMinusOne  ~ dexp(1/29)   
                 nu <- nuMinusOne+1  
              } "

writeLines(tMod1, con="tMod1.txt")

We actually set the prior for degrees of freedom minus one, but add one back in after sampling it from exponential distribution.

Our starting values,

initsList = list("b0"=mean(nels$math), "b1"=sd(nels$math), "precision"=1/sd(nels$math)**2)

Initialize the model

jagsModelLm2 <- jags.model(file="tMod1.txt", 
                        data=dataList,
                        inits=initsList,
                        n.chains=4,
                        n.adapt=1000)     
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 67
##    Unobserved stochastic nodes: 4
##    Total graph size: 168
## 
## Initializing model

Now pull the samples,

# contains samples from all chains with 20000 iterations
tSamples2 <- coda.samples(jagsModelLm2, variable.names=c("b0","b1","precision","sigma2", "nu"), n.iter=20000)

Does everything look good — do thorough checking that algorithm converged satisfactorily

plot(tSamples2)
gelman.diag(tSamples2)
gelman.plot(tSamples2)
autocorr.plot(tSamples2,auto.layout=TRUE)
geweke.diag(tSamples2,frac1=0.1,frac2=0.5)
effectiveSize(tSamples2)
cumuplot(tSamples2,probs=c(.25,.50,.75),lwd=c(1,2),lty=c(2,1),col=c("blue","red"))
HPDinterval(tSamples2)

It’s OK.

# output summary information
summary(tSamples2)
## 
## Iterations = 1001:21000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##               Mean       SD  Naive SE Time-series SE
## b0        59.46510  1.40936 4.983e-03      1.825e-02
## b1         1.09938  0.37918 1.341e-03      4.883e-03
## nu        27.41015 25.31913 8.952e-02      2.325e-01
## precision  0.04124  0.01026 3.629e-05      7.685e-05
## sigma2    25.61604  5.87981 2.079e-02      3.953e-02
## 
## 2. Quantiles for each variable:
## 
##               2.5%      25%      50%      75%    97.5%
## b0        56.68250 58.52434 59.46088 60.40753 62.23321
## b1         0.35077  0.84812  1.10072  1.35367  1.84384
## nu         3.67200  9.86818 19.07002 36.37973 97.91749
## precision  0.02604  0.03425  0.03965  0.04631  0.06608
## sigma2    15.13207 21.59539 25.21933 29.19632 38.39859

Removing Outlier

Note that there is an outlier in that math scores (<45) and who only does 1 hours of homework.We will remove the outlier and re-run model to see if there is an impact on the results.

Remove the student with the smallest math score:

min(nels$math)
## [1] 43
nels2 <- subset(nels, math>min(nels$math))
min(nels2$math)
## [1] 51

We need to re-do data list

dataList <- list(y=nels2$math,
                 x=nels2$homework,
                 N=length(nels2$math),
                 sdY = sd(nels2$math)
                 )

We can use out initial model again, “nelsLR1.txt” but out initial starting values are dependent on the data set so this needs to be re-done

b0Init = mean(nels2$math)
b1Init = 0
precInit = 1/sd(nels2$math)
initsList = list(b0=b0Init, b1=b1Init, precision=precInit )
jagsModelLm1x <- jags.model(file="nelsLR1.txt", 
                        data=dataList,
                        inits=initsList,
                        n.chains=4,
                        n.adapt=500)   # adaptive phase for 500 iterations
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 66
##    Unobserved stochastic nodes: 3
##    Total graph size: 161
## 
## Initializing model

And re-run and get samples

update (jagsModelLm1x, n.iter=10000)              

# gets samples from all chains 
Samples1x <- coda.samples(jagsModelLm1x, variable.names=c("b0","b1","precision"), n.iter=20000)

Check convergence (not done here, see lecture notes or re-do Step 5 above) and if OK take a look at results. For the code for this case is

# Serious and Thorough Model Checking
plot(Samples1x)
gelman.diag(Samples1x)
gelman.plot(Samples1x)
autocorr.plot(Samples1x,auto.layout=TRUE)
geweke.diag(Samples1x,frac1=0.1,frac2=0.5)
effectiveSize(Samples1x)
cumuplot(Samples1x,probs=c(.25,.50,.75),lwd=c(1,2),lty=c(2,1),col=c("blue","red"))
HPDinterval(Samples1x)

If model seems to have converged, look at summary statistics of posterior for parameters.

# output summary information
summary(Samples1x)
## 
## Iterations = 10001:30000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##               Mean       SD  Naive SE Time-series SE
## b0        60.12614 1.359032 4.805e-03      0.0141066
## b1         0.89560 0.362562 1.282e-03      0.0037782
## precision  0.04056 0.007166 2.534e-05      0.0000266
## 
## 2. Quantiles for each variable:
## 
##               2.5%      25%      50%      75%    97.5%
## b0        57.47254 59.21609 60.11988 61.02996 62.81269
## b1         0.17728  0.65670  0.89729  1.13788  1.60790
## precision  0.02765  0.03555  0.04014  0.04515  0.05573

Should we keep the outlier in data or not? For the rest of this document, I left it in the data.

Posterior Checks

We now have a posterior distribution from which we can draw values. I will draw from this posterior and compute Bayesian p-values for the parameter estimates and examine these relative to the posteriors for the parameters.

There are two ways to proceed: Monte Carlo sampling from posterior using estimated parameters or building it into the model definition. The first is easier and more straight forward and the second uses the actual samples. They should give basically the same results.

Method I: Monte Carlo Simulation

I need to extract the appropriate values for the my parameters. This does the trick

t <- summary(Samples1)
mu.b0 <- t[[1]][1]
sd.b0 <- t[[1]][4]
mu.b1 <- t[[1]][2]
sd.b1 <- t[[1]][5]
prec  <- t[[1]][3]
sigma <- sqrt(1/prec)

I will take 200 replications to get predicted values of y

n <- length(nels$math)
replications <- 200
yrep <- matrix(99,nrow=n,ncol=replications)
for (s in 1:replications){
  b0 <- rnorm(1, mu.b0, sd.b0)
  b1 <- rnorm(1, mu.b1, sd.b1)
 for (i in 1:n){
     yrep[i,s] = b0 + b1*nels$homework[i] + rnorm(1, 0, sigma)
  }
}

Using my new data set, I will first compute descriptive statistic (check to see if results are reasonable). These are descriptive statistics for each simulated value (i.e., there will be S of these)

# descriptive statistics
yhats <- apply(yrep,2,"mean")
ymin  <- apply(yrep,2,"min")
ymax  <- apply(yrep,2,"max")
ysd   <- apply(yrep,2,"sd")

Look OK?

To compute Bayesian p-values,

# Bayesian p-values
(pmean <- mean(mean(nels2$math) < yhats))
## [1] 0.445
(pmin  <- mean(min(nels2$math) < ymin))
## [1] 0.27
(pmax  <- mean(max(nels2$math) < ymax))
## [1] 0.965
(psd   <- mean(sd(nels2$math) < ysd))
## [1] 0.88

Lets take a look at these p-values relative to distributions of each statistic. The blue line is data and histogram is posterior for the statistic.

par(mfrow=c(2,2))
hist(ymin,breaks=10,main=paste("Simulated N=200 Minimums", "\nBayesian P-value =", round(pmin, 2)))
lines(c(min(nels$math),min(nels$math)),c(0,40),col="blue",lwd=2)

hist(ymax,breaks=10,main=paste("Simulated  N=200 Maximums", "\nBayesian P-value =", round(pmax, 2)))
lines(c(max(nels$math),max(nels$math)),c(0,40),col="blue",lwd=2)

hist(yhats,breaks=10,main=paste("Simulated  N=200 Means", "\nBayesian P-value =", round(pmean, 2)))
lines(c(mean(nels$math),mean(nels$math)),c(0,40),col="blue",lwd=2)

hist(ysd,breaks=10,main=paste("Simulated  N=200 SDs", "\nBayesian P-value =", round(psd, 2)))
lines(c(sd(nels$math),sd(nels$math)),c(0,40),col="blue",lwd=2)

And one more set of figures

# Graphs of data and posterior predictive distribution
ypred <- apply(yrep,1,"mean")
par(mfrow=c(2,2))
hist(nels$math,main="Data Distribution",breaks=10)
hist(ypred,main="Predicted Posterior Distribution",breaks=10)
plot(nels$homework,nels$math,main="Data: math x homework",ylim=c(45,80))
plot(nels$homework, ypred,type='p',main='Predictions: math x homework',ylim=c(45,80))
schMathHmwk <- aggregate(nels$math, list(nels$homework), "mean")
lines(schMathHmwk$Group.1,schMathHmwk$x,type='p',pch=19,col="blue")

Method II: Within the Model

Not only will I compute y’s from the posterior, I will also get residuals .

##
dataList <- list(y=nels$math,
                 x=nels$homework,
                 N=length(nels$math),
                 sdY = sd(nels$math)
                 )

ModelLm1x = "model {   
## Likelihood                                 
              for (i in 1:N){
                y[i] ~ dnorm(mu[i] , precision)
                mu[i] <- b0 + b1*x[i] 

                res[i] <- y[i] - mu[i]               # Data residuals
                emp.new[i] ~ dnorm(mu[i],precision)  # New Predicted y 
                res.new[i] <- emp.new[i] - mu[i]     # New Predicted residuals
        
                  }
## Prior
                b0 ~ dnorm(0 , 1/(100*sdY^2) )   
                b1 ~ dnorm(0 , 1/(100*sdY^2) )   
                sigma <- 1/precision
                precision ~ dgamma(.01, .01)
## Derived parameters     
                fit <- sum(res[])                    # sum data residuals this iterations
                fit.new <- sum(res.new[])            # sum predicted residuals 
                fit.mean <- mean(emp.new[])          # mean of posterior predictions
              } "

writeLines(ModelLm1x, con="ModelLm1x.txt")

b0Init = mean(nels2$math)
b1Init = 0
precInit = 1/sd(nels2$math)
initsList = list(b0=b0Init, b1=b1Init, precision=precInit )

jagsModelLm1x <- jags.model(file="ModelLm1x.txt", 
                        data=dataList,
                        inits=initsList,
                        n.chains=4,
                        n.adapt=1000)           
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 67
##    Unobserved stochastic nodes: 70
##    Total graph size: 354
## 
## Initializing model
update (jagsModelLm1x, n.iter=1000)             

# gets samples from all chains for 20000 iterations
Samples1xx <- coda.samples(jagsModelLm1x, variable.names=c("b0","b1","precision","fit","fit.new","fit.mean","emp.new"), n.iter=100000, thin.iter=2000)

Output summary information and save to object

txx <- summary(Samples1xx)

Extract what I want, the fitted values

fitted <- txx[[1]][3:69]     # These are means of fitted values yhats
r <- cor(nels$math, fitted)
R2 <- round(r**2, 4)
plot(nels$math, fitted, main=paste("Fitted by Observed:  R^2 =",R2))

Note that \(R^2\) here is close in value to the \(R^2\) from OLS (way back at beginning of document.)

We can also save the actual posterior sample values into seperate objects that we can do what ever we want with them.

# change from mcmc to array object which has dim = (niter, 73, 4) 
ffit <- as.array(Samples1xx) 

Stack the chains so dim(stacked) = (niter*4, 73)

stacked <- rbind(ffit[,,1],ffit[,,2],ffit[, , 3], ffit[, , 4])

Extract samples of specific parameters into separate objects

# put sampled values into vectors or array:
b0 <- stacked[,1]
b1 <- stacked[,2]
pred <- stacked[,3:69]
fit <- stacked[,70]
fit.mean <- stacked[,71]
fit.new <- stacked[,72]
sigma <- stacked[,73]

Note that these two values are the same

mean(nels$math)
## [1] 62.8209
mean(pred)
## [1] 62.81082

Bayesian pvalue for data vs predictions:

bpval <- mean(nels$math < pred)
bpval
## [1] 0.4818512
par(mfrow=c(2,2))
hist(nels$math,breaks=10,main="Distribution of Data")
hist(fitted,breaks=10,main=paste("Posterior Predictions", "\nBayesian p-value=", round(bpval,2)))
lines(c(mean(nels$math),mean(nels$math)),c(0,15),col="blue",lwd=2)

Now for Bayesian p-values for other summary statistics of the posterior

# Compute statistics over columns (students)
ymeans <- apply(pred,1,"mean")
ymin  <- apply(pred,1,"min")
ymax  <- apply(pred,1,"max")
ysd   <- apply(pred,1,"sd")

# Bayesian p-values
pmean <- mean(mean(nels$math) < ymeans)
pmin  <- mean(min(nels$math) < ymin)
pmax  <- mean(max(nels$math) < ymax)
psd   <- mean(sd(nels$math) < ysd)


# Graphs of descriptive statistics from posterior predictive distribution 
par(mfrow=c(2,2))
 
hist(ymin,breaks=10,main=paste("Simulated within Jags: Minimums", "\nBayesian P-value =", round(pmin, 2)))
lines(c(min(nels$math),min(nels$math)),c(0,26000),col="blue",lwd=2)

hist(ymax,breaks=10,main=paste("Simulated within Jags: Maximums", "\nBayesian P-value =", round(pmax, 2)))
lines(c(max(nels$math),max(nels$math)),c(0,26000),col="blue",lwd=2)

hist(ymeans,breaks=10,main=paste("Simulated within Jags:  Means", "\nBayesian P-value =", round(pmean, 2)))
lines(c(mean(nels$math),mean(nels$math)),c(0,26000),col="blue",lwd=2)

hist(ysd,breaks=10,main=paste("Simulated within Jags:  SDs", "\nBayesian P-value =", round(psd, 2)))
lines(c(sd(nels$math),sd(nels$math)),c(0,26000),col="blue",lwd=2)

Drop outlier?