My ANSWERS are in blue. I also bumped up the font size.

Library that I will be using:

library(LearnBayes)
library(HDInterval)

Problem 1

No computations

Problem 2

This is mostly just graphing. Note that density of beta for p=0.0 and p=1.00 equal infinity. So when setting limits for p, I used the following:

p <- seq(.0000001, .9999999, length.out=1E+5)

(a) Beta(0.5, 0.5)

fp <- dbeta(p, 0.5, 0.5)
plot(p, fp, type="l",
     main="2(a) Density of Beta(0.5,.05)")

Bi-modal

(b) Beta(10.2, 1.5)

fp <- dbeta(p, 10.2, 1.5)
plot(p, fp, type="l",
     main="2(b) Density of Beta(10.2, 1.5)")

Big left skew

(c) Beta(1.5, 10.2)

fp <- dbeta(p, 1.5, 10.2)
plot(p, fp, type="l",
     main="2(c) Density of Beta(1.5, 10.2)")

Mirror image of Beta(1.5, 10.2) and this has a big right skew.

(d) Beta(100, 62)

fp <- dbeta(p, 100, 62)
plot(p, fp, type="l",
     main="2(d) Density of Beta(100, 62)")

Is uni-modal and highly concentrated round the mean/median/mode.

Problem 3

(a) prior \(\pi \sim \mbox{Beta}(1, 1)\), likelihood \(y \sim \mbox{Binomial}(14, 17)\), posterior \(\sim \pi | y\).

i Plot prior, likelihood and posterior. Comment.

triplot(c(1,1), c(14, 3),where="topleft")

* Flat priors

  • posterior \(\approx\) likelihood

  • these are Uni-modal with strong left skew

ii. What is the mean of the posterior?

a.post <- 14+1
b.post <- 17-14 + 1

(mean <- a.post/(a.post + b.post))
## [1] 0.7894737

The mean of the posterior equals 0.7895.

iii. What is the mode of the posterior?

(mode <- (a.post-1)/(a.post + b.post - 2))
## [1] 0.8235294

The mode of the posterior equals 0.8236.

iv. What are the 95% credible intervals (quantiles)?

pi.95 <- c(.025, .975)
(ci <- qbeta(pi.95, a.post, b.post))
## [1] 0.5858225 0.9359080

The 95% quantile (credible) intervals are (0.5858, 0.9359)

v. What are the 95% high density intervals?

rp <- rbeta(1E5, a.post, b.post)
(hci <- hdi(rp, credMass= .95))
##     lower     upper 
## 0.6091927 0.9510663 
## attr(,"credMass")
## [1] 0.95

The $95% high density intervals are (0.6081, 0.9490)

(b) prior \(\pi \sim\) Beta(5, 10), likelihood \(y \sim \mbox{Binomial}(14, 17)\), posterior \(\sim \pi | y\).

This is just a change in priors…

i. Plot prior, likelihood and posterior. Comment and compare with those from part (a)i.

triplot(c(5,10), c(14,3), where="topleft")

The data are the same, but the prior is now informative. The posterior (red) is in between the likelihood and the prior. The posterior is a weighted average of the prior and data.

The posterior has shifted to the left (toward the prior) relative to the situation in part (a); therefore, we expect the mean, mode and interval to also shift.

ii. What is the posterior distribution?

(a.post <- 14+5)
## [1] 19
(b.post <- 3+10)
## [1] 13

The posterior distribution is Beta( 19, 13)

iii. What is the mean of the posterior?

(mean <- a.post/(a.post + b.post))
## [1] 0.59375

The mean is 0.5939

iv. What is the mode of the posterior?

(mode <- (a.post-1)/(a.post + b.post - 2))
## [1] 0.6

The model equals 0.6, which looks correct when compared to figure.

v. What are the 95% credible intervals (quantiles)?

qbeta(c(.025, .975), a.post, b.post)
## [1] 0.4218696 0.7545240

The 95% credible interval is (0.4219, 0.7545)

vi. What are the 95% high density intervals?

(use HDIinterval package)

rpi <- rbeta(1E5, a.post, b.post)
hdi( rpi, credMass=.95)
##     lower     upper 
## 0.4219784 0.7548312 
## attr(,"credMass")
## [1] 0.95

The 95% high density interval is ( 0.4278, 0.7609)

Problem 4

Now to use this with data, data from the GSS for different years responses to favor or oppose a law which would require a person to obtain a permit before he or she could buy a gun.

(a) For the Year 2006

i. What is the sample proportion who favor a law?

(fav.2006 <- 1568)
## [1] 1568
(opo.2006 <-  395)
## [1] 395
(n.2006 <- fav.2006 + opo.2006)
## [1] 1963
(p.2006 <- fav.2006/n.2006)
## [1] 0.7987774

The sample proportion of those who favor a law is 0.7988

ii. What is the posterior distribution for the proportion of people who favor guncontrol (use Uniform prior)? Also, plot the prior, likelihood and posterior and comment on the plot.

a2006 <- fav.2006 + 1
b2006 <- opo.2006 + 1
triplot(c(1,1), c(fav.2006, opo.2006), where="topleft")

The posterior distribution is Beta(1569, 396) |

iii. What is the mean of this distribution? Comment.

Comment: The likelihood and posterior are on top of each other because the prior has no impact (it’s flat, no informative).

a2006/(a2006+b2006)
## [1] 0.7984733

The mean of the distribution is .7985, which is very close to the sample proportion (i.e., data are dominating)

What is the 95% credible interval? Interpret.

qbeta(pi.95, 1569, 396)
## [1] 0.7804574 0.8159137

Given the data, the probability that the true mean is within the interval from .78 to .82 is .95. Also note that the sample proportion p = .7985 is within this interval. Sample and Bayesian are so close because n (number of trials) is large.

What is the 95% high density interval? Interpret.

x <- rbeta(1E8, a2006, b2006)
hdi(x, credMass=.95)
##     lower     upper 
## 0.7806623 0.8161056 
## attr(,"credMass")
## [1] 0.95

Given the data, the probability that the true mean is within the interval from .78 to .82 is .95.

vi. If you were reporting this result in a paper, which interval would you use?

It doesn’t really matter, I could report either one. They are almost the same because posterior ends up being fairly symmetric (due to large sample size).

For the year 2016

i. What is the sample proportion who favor a law?

p = 1330/(1330 + 528) = .7158

Using data from 2006 as the prior, what is the posterior distribution for the proportion of people who favor gun control? Also plot the prior, likelihood and posterior and comment on the plot.

I will use data from 2006 as my prior:

triplot(c(a2006, b2006), c(1330, 528), where="topleft")

(a.post16<- 1330 + 1569)
## [1] 2899
(b.post16<- 528 + 396)
## [1] 924

Posterior distribution is Beta(2899, 924).

Comment: We have a lot of information from the prior and the posterior is tight and in between the prior and posterior. If we had not use an informative prior (i.e., used the uniform), the posterior would be essentially the likelihood). The posterior looks symmetric.

iii. What is the mean of this distribution?

a.post16/(a.post16+b.post16)
## [1] 0.758305

The mean of the posterior distribution equals .7583

iv. What is the 95% credible interval? Interpret.

qbeta(c(.025, .975), a.post16, b.post16)
## [1] 0.7446093 0.7717447

The \(95%\) credible interval in 2016 is (.7446, .7717). The probability that the true probability is between .7446 and .7717 is .95.

v. What is the 95% high density interval? Interpret

ran.pi <- rbeta(1E6, a.post16, b.post16)
hdi(ran.pi, credMass=.95)
##     lower     upper 
## 0.7446974 0.7718510 
## attr(,"credMass")
## [1] 0.95

The 95% high density interval is (.7447, .7719). The probability is .95 that the true value of \(\pi\) is within the interval .7447 to .7719. This interval focuses more on where most of the mass is in the posterior density, which is good when the posterior is highly skewed. This is a shorter interval than the quantile interval. With the large sample sizes, the two types of intervals are nearly the same.

vi.If you were reporting this result in a paper, which interval would you use?

This HDI interval focuses more on where most of the mass is in the posterior density, which is good when the posterior is highly skewed. This is a shorter interval than the quantile interval. With the large sample sizes, the two types of intervals are nearly the same.

Either one would be fine{ the are the same (i.e., posterior is pretty symmetric).

Problem 5 Two proportions (optional)

If you want to give Monte Carlo a try. We’ll use non-informative priors for 2016 (don’t want posterior to depend on it)

(a.post16 <- 1330 + 1)
## [1] 1331
(b.post16 <- 528 + 1)
## [1] 529

The Monte Carlo draws from the posteriors for 2006 and 2016.

S <- 1E5
pi2006 <- rbeta(S, a2006, b2006)
pi2016 <- rbeta(S, a.post16, b.post16)

(mean(pi2016>pi2006))
## [1] 0
(mean(pi2016<pi2006))
## [1] 1
 plot(pi2006, pi2016, type="p",
      main="Monte Carlo Results")
 abline(a=c(0,0), b=1, col="blue", lwd=2)

It seems that the probability of the favor for the law is higher in 2006 (i.e., \(100\%\) of draws from 2006 and larger than those from 2016). This is even more noticeable in the figure when I put in an identity line (for some reason is does not show up when I render R markdown). Note that the means of posteriors for each year are below:

mean(pi2006)
## [1] 0.7984474
mean(pi2016)
## [1] 0.7155936

and credible intervals

qbeta(c(.025,.075), a2006, b2006)
## [1] 0.7804574 0.7853417
qbeta(c(.025,.075), a.post16, b.post16)
## [1] 0.6948786 0.7004536