These data were pulled from the CDC web-site Cases and inculdes Data Table for Daily Case Trends - United States. I retrived the data Mon Jul 05 2021 11:18:17 GMT-0500 (Central Daylight Time).
At the beginning of the pandemic, little was known about COVID19. One question was how lethal the disease was; that is, what is the probability of death given that a person has contracted COVID19. This example looks at this questions but starts with data from Feb 2020, and then adds data for the next month, and so on. In other words, we simulate what happens as more data became available.
The data include date, new cases, and deaths. This is a very macro view, because don’t know if cases during month correspond to deaths for that month.
I will use Beta for priors and binomial distribution for the likelihood. The posterior mean and modes using analytic results, and for plotting prior, likellihood & posterior I use the function from the LearnBayes package "tri.plot.
Librarys that I used
library(tidyr) # do deal with date
library(HDInterval) # for high density interval
library(LearnBayes) # for plotting prior, likelihood & posterior
For plotting, I created a little wrapper function about “tri.plot” that puts the prior, likelihood and posterior all in one figure
BetaBinomial.tri <- function(y, n , shape.1, shape.2){
triplot(c(shape.1,shape.2),c(y,(n-y)))
}
If you want to have separate figures for the prior, likellihood and posterior you can use
beta.binomial <- function(y,n,shape.1,shape.2){
N<- 100000 # how many p's to check
p <- seq(from=0,to=1,length.out=N) # possible values for probability of sucess
likelihood <- dbinom(y, n, prob=p) # binomial likelihood
layout(matrix(c(1,1,0,0,1,1,3,3,2,2,3,3,2,2,0,0),4,4,byrow=FALSE))
# Beta prior
prior.beta <- dbeta(p,shape.1,shape.2) # beta(shape.1,shape.2) prior
posterior.beta <- prior.beta*likelihood # posterior w/ beta(shape.1,shape.2) prior
posterior.beta <- posterior.beta/sum(posterior.beta)
plot(p,prior.beta, type="l", main=paste("Beta(",shape.1,",",shape.2,") Prior") )
plot(p,likelihood, type="l", main=paste("Binomial Likelihood, y=", y, "& n=", n) )
plot(p,posterior.beta, type="l", main=paste("Posterior with Beta(",shape.1,",",shape.2,") Prior") )
}
There are two data sets that are on the web-site. One contains information on number of cases and the other number of deaths: “cases.csv” and “death.csv”. We will read in the data and change the way date is recorded and merge the two data sets.
Number of Cases of COVID:
cases.us <- read.csv("D:/Dropbox/edps 590BAY/Lectures/2 Proportion/covid_data_charts/cases.csv", header=TRUE)
cases <- separate(data=cases.us, col="Date", into=c("month", "day","year"), sep = " ")
n1 <- nrow(cases)
Number of Deathes due to COVID:
deaths.us <- read.csv("D:/Dropbox/edps 590BAY/Lectures/2 Proportion/covid_data_charts/deaths.csv", header=TRUE)
deaths <- separate(data=deaths.us, col="Date", into=c("month", "day","year"), sep = " ")
n2 <- nrow(deaths)
Note that both data sets contain data from 528 days. Merge them matching up the dates:
covid <- merge(cases, deaths, by=c("month","day","year"))
Pull the data from February, set y, n and compute proportion of deaths
feb <- covid[which(covid$month=="Feb" & covid$year==2020),]
(nrow(feb))
## [1] 29
(n.feb <- sum(feb$New.Cases))
## [1] 1041
(y.feb <- sum(feb$New.Deaths))
## [1] 4
(p.feb <- y.feb/n.feb)
## [1] 0.003842459
The likelihood will be the binomial; however, what Beta distribution should we use for the prior. This is open to debate, but since this is the very beginning of the pandemic, I will use a uniform, Beta(1,1)
# Uniform Prior theta ~ beta(1,1)
# Binomial likelihood y ~ binomial(y,n)
# Posterior beta beta(a+1,b+1)
a1 <- y.feb
b1 <- n.feb-y.feb
# posterior beta(a+1,b+1)
(a.post1 <- a1+1)
## [1] 5
(b.post1 <- b1+1)
## [1] 1038
(mu.feb <- a.post1/(a.post1 + b.post1) ) # mean posterior
## [1] 0.004793864
(mode.feb <- (a.post1-1)/(a.post1 + b.post1-2)) # mode posterior
## [1] 0.003842459
To get an idea of the range of possible values, we compute 95% credible intervals.
# 95% credible interval using quantiles from posterior
(ci.q <- qbeta(c(.025, .975),a.post1, b.post1))
## [1] 0.001559829 0.009799376
However the postierior or probably pretty skewed so maybe it would be better to use high density intervals.
# 95% high density interval
intheta <- rbeta(1E5, a.post1, b.post1)
(ci.hd <- hdi(intheta,credMass= .95))
## lower upper
## 0.001186378 0.009051350
## attr(,"credMass")
## [1] 0.95
# or 2nd way to use hdi
intheta <- seq(.0001, 1, length.out=10000)
(ci.hd <- hdi(qbeta(intheta,a.post1 ,b.post1), credMass= .95))
## lower upper
## 0.001162459 0.009029134
## attr(,"credMass")
## [1] 0.95
To look a the posterior, likelihood and prior all in one plot
BetaBinomial.tri(y.feb, n.feb, 1, 1)
The Likelihood is the same as the posterior.
Let’s look at these in 3 separate plots
beta.binomial(y.feb,n.feb, 1, 1)
As more data becomes available, we can up-date our estimate of the probability of death.
march <- covid[which(covid$month=="Mar" & covid$year==2020),]
(nrow(march))
## [1] 31
(n.mar <- sum(march$New.Cases))
## [1] 226920
(y.mar <- sum(march$New.Deaths))
## [1] 4982
(p.mar <- y.mar/n.mar)
## [1] 0.02195487
We know have some information, and we can use February’s posterior as our prior.
# Make Feb post march's priors
a <- a.post1
b <- b.post1
# Prior: beta(a,b)
# Binomial: binomial(y.mar, n.mar)
# Posterior: beta(a + y.mar), (b + n.mar-y.mar))
(a.post2 <- a + y.mar)
## [1] 4987
(b.post2 <- b + n.mar - y.mar )
## [1] 222976
(mu.march <- a.post2/(a.post2 + b.post2))
## [1] 0.02187636
(mode.march <- (a.post2-1)/(a.post2 + b.post2 - 2) )
## [1] 0.02187216
# 95% high density interval
intheta <- rbeta(1E5, a.post2, b.post2)
(ci.hd <- hdi(intheta,credMass= .95))
## lower upper
## 0.02128359 0.02247945
## attr(,"credMass")
## [1] 0.95
BetaBinomial.tri(y.mar, n.mar, a.post1, b.post1)
Follow the steps above and find out what the up-dated probability of death is.
More data becomes available, what is now the posterior?
We may wonder whether the probability of death changed between February and March. We will use Monte Carlo to investigate this.
We will use uniform priors and we already have the posterior for Febrary with these. So we’ll find the posterior for March using uniform prior
march <- covid[which(covid$month=="Mar" & covid$year==2020),]
(nrow(march))
## [1] 31
(n.mar <- sum(march$New.Cases))
## [1] 226920
(y.mar <- sum(march$New.Deaths))
## [1] 4982
(p.mar <- y.mar/n.mar)
## [1] 0.02195487
We will need the posterior for each month. We will February’s posterior as the prior.
March:
# Uniform priors
a <- 1
b <- 1
# Prior: beta(a,b)
# Binomial: binomial(y.mar, n.mar)
# Posterior: beta(a + y.mar), (b + n.mar-y.mar))
(a.post2 <- a + y.mar)
## [1] 4983
(b.post2 <- b + n.mar - y.mar )
## [1] 221939
(mu.march <- a.post2/(a.post2 + b.post2))
## [1] 0.02195909
(mode.march <- (a.post2-1)/(a.post2 + b.post2 - 2) )
## [1] 0.02195487
S <- 1e+5
theta.feb <- rbeta(S,a.post1, b.post1)
theta.mar <- rbeta(S, a.post2, b.post2)
summary(theta.feb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000237 0.003236 0.004475 0.004799 0.006019 0.020585
hist(theta.feb)
summary(theta.mar)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02061 0.02175 0.02196 0.02196 0.02217 0.02339
hist(theta.mar)
plot(theta.feb, theta.mar, xlim=c(0,0.025), ylim=c(0,0.025))
lines(c(0, 0.025), c(0, 0.025), col="blue")
# Find proportion/probability of times that theta1 > theta2
mean(theta.mar>theta.feb)
## [1] 1
You can play with changing priors for March, but the results will likely stay the same.
Check to see if probability changed.