My ANSWERS are in blue. I also bumped up the font size.
Library that I will be using:
library(LearnBayes)
library(HDInterval)
No computations
This is mostly just graphing. Note that density of beta for p=0.0 and p=1.00 equal infinity. So when setting limits for p, I used the following:
p <- seq(.0000001, .9999999, length.out=1E+5)
fp <- dbeta(p, 0.5, 0.5)
plot(p, fp, type="l",
main="2(a) Density of Beta(0.5,.05)")
Bi-modal
fp <- dbeta(p, 10.2, 1.5)
plot(p, fp, type="l",
main="2(b) Density of Beta(10.2, 1.5)")
Big left skew
fp <- dbeta(p, 1.5, 10.2)
plot(p, fp, type="l",
main="2(c) Density of Beta(1.5, 10.2)")
Mirror image of Beta(1.5, 10.2) and this has a big right skew.
fp <- dbeta(p, 100, 62)
plot(p, fp, type="l",
main="2(d) Density of Beta(100, 62)")
Is uni-modal and highly concentrated round the mean/median/mode.
triplot(c(1,1), c(14, 3),where="topleft")
* Flat priors
posterior \(\approx\) likelihood
these are Uni-modal with strong left skew
a.post <- 14+1
b.post <- 17-14 + 1
(mean <- a.post/(a.post + b.post))
## [1] 0.7894737
The mean of the posterior equals 0.7895.
(mode <- (a.post-1)/(a.post + b.post - 2))
## [1] 0.8235294
The mode of the posterior equals 0.8236.
pi.95 <- c(.025, .975)
(ci <- qbeta(pi.95, a.post, b.post))
## [1] 0.5858225 0.9359080
The 95% quantile (credible) intervals are (0.5858, 0.9359)
rp <- rbeta(1E5, a.post, b.post)
(hci <- hdi(rp, credMass= .95))
## lower upper
## 0.6091927 0.9510663
## attr(,"credMass")
## [1] 0.95
The $95% high density intervals are (0.6081, 0.9490)
This is just a change in priors…
triplot(c(5,10), c(14,3), where="topleft")
The data are the same, but the prior is now informative. The posterior (red) is in between the likelihood and the prior. The posterior is a weighted average of the prior and data.
The posterior has shifted to the left (toward the prior) relative to the situation in part (a); therefore, we expect the mean, mode and interval to also shift.
(a.post <- 14+5)
## [1] 19
(b.post <- 3+10)
## [1] 13
The posterior distribution is Beta( 19, 13)
(mean <- a.post/(a.post + b.post))
## [1] 0.59375
The mean is 0.5939
(mode <- (a.post-1)/(a.post + b.post - 2))
## [1] 0.6
The model equals 0.6, which looks correct when compared to figure.
qbeta(c(.025, .975), a.post, b.post)
## [1] 0.4218696 0.7545240
The 95% credible interval is (0.4219, 0.7545)
(use HDIinterval package)
rpi <- rbeta(1E5, a.post, b.post)
hdi( rpi, credMass=.95)
## lower upper
## 0.4219784 0.7548312
## attr(,"credMass")
## [1] 0.95
The 95% high density interval is ( 0.4278, 0.7609)
Now to use this with data, data from the GSS for different years responses to favor or oppose a law which would require a person to obtain a permit before he or she could buy a gun.
(fav.2006 <- 1568)
## [1] 1568
(opo.2006 <- 395)
## [1] 395
(n.2006 <- fav.2006 + opo.2006)
## [1] 1963
(p.2006 <- fav.2006/n.2006)
## [1] 0.7987774
The sample proportion of those who favor a law is 0.7988
a2006 <- fav.2006 + 1
b2006 <- opo.2006 + 1
triplot(c(1,1), c(fav.2006, opo.2006), where="topleft")
The posterior distribution is Beta(1569, 396) |
Comment: The likelihood and posterior are on top of each other because the prior has no impact (it’s flat, no informative).
a2006/(a2006+b2006)
## [1] 0.7984733
The mean of the distribution is .7985, which is very close to the sample proportion (i.e., data are dominating)
qbeta(pi.95, 1569, 396)
## [1] 0.7804574 0.8159137
Given the data, the probability that the true mean is within the interval from .78 to .82 is .95. Also note that the sample proportion p = .7985 is within this interval. Sample and Bayesian are so close because n (number of trials) is large.
x <- rbeta(1E8, a2006, b2006)
hdi(x, credMass=.95)
## lower upper
## 0.7806623 0.8161056
## attr(,"credMass")
## [1] 0.95
Given the data, the probability that the true mean is within the interval from .78 to .82 is .95.
It doesn’t really matter, I could report either one. They are almost the same because posterior ends up being fairly symmetric (due to large sample size).
p = 1330/(1330 + 528) = .7158
I will use data from 2006 as my prior:
triplot(c(a2006, b2006), c(1330, 528), where="topleft")
(a.post16<- 1330 + 1569)
## [1] 2899
(b.post16<- 528 + 396)
## [1] 924
Posterior distribution is Beta(2899, 924).
Comment: We have a lot of information from the prior and the posterior is tight and in between the prior and posterior. If we had not use an informative prior (i.e., used the uniform), the posterior would be essentially the likelihood). The posterior looks symmetric.
a.post16/(a.post16+b.post16)
## [1] 0.758305
The mean of the posterior distribution equals .7583
qbeta(c(.025, .975), a.post16, b.post16)
## [1] 0.7446093 0.7717447
The \(95%\) credible interval in 2016 is (.7446, .7717). The probability that the true probability is between .7446 and .7717 is .95.
ran.pi <- rbeta(1E6, a.post16, b.post16)
hdi(ran.pi, credMass=.95)
## lower upper
## 0.7446974 0.7718510
## attr(,"credMass")
## [1] 0.95
The 95% high density interval is (.7447, .7719). The probability is .95 that the true value of \(\pi\) is within the interval .7447 to .7719. This interval focuses more on where most of the mass is in the posterior density, which is good when the posterior is highly skewed. This is a shorter interval than the quantile interval. With the large sample sizes, the two types of intervals are nearly the same.
This HDI interval focuses more on where most of the mass is in the posterior density, which is good when the posterior is highly skewed. This is a shorter interval than the quantile interval. With the large sample sizes, the two types of intervals are nearly the same.
Either one would be fine{ the are the same (i.e., posterior is pretty symmetric).
If you want to give Monte Carlo a try. We’ll use non-informative priors for 2016 (don’t want posterior to depend on it)
(a.post16 <- 1330 + 1)
## [1] 1331
(b.post16 <- 528 + 1)
## [1] 529
The Monte Carlo draws from the posteriors for 2006 and 2016.
S <- 1E5
pi2006 <- rbeta(S, a2006, b2006)
pi2016 <- rbeta(S, a.post16, b.post16)
(mean(pi2016>pi2006))
## [1] 0
(mean(pi2016<pi2006))
## [1] 1
plot(pi2006, pi2016, type="p",
main="Monte Carlo Results")
abline(a=c(0,0), b=1, col="blue", lwd=2)
It seems that the probability of the favor for the law is higher in 2006 (i.e., \(100\%\) of draws from 2006 and larger than those from 2016). This is even more noticeable in the figure when I put in an identity line (for some reason is does not show up when I render R markdown). Note that the means of posteriors for each year are below:
mean(pi2006)
## [1] 0.7984474
mean(pi2016)
## [1] 0.7155936
and credible intervals
qbeta(c(.025,.075), a2006, b2006)
## [1] 0.7804574 0.7853417
qbeta(c(.025,.075), a.post16, b.post16)
## [1] 0.6948786 0.7004536