Introduction

These data were pulled from the CDC web-site Cases and inculdes Data Table for Daily Case Trends - United States. I retrived the data Mon Jul 05 2021 11:18:17 GMT-0500 (Central Daylight Time).

At the beginning of the pandemic, little was known about COVID19. One question was how lethal the disease was; that is, what is the probability of death given that a person has contracted COVID19. This example looks at this questions but starts with data from Feb 2020, and then adds data for the next month, and so on. In other words, we simulate what happens as more data became available.

The data include date, new cases, and deaths. This is a very macro view, because don’t know if cases during month correspond to deaths for that month.

I will use Beta for priors and binomial distribution for the likelihood. The posterior mean and modes using analytic results, and for plotting prior, likellihood & posterior I use the function from the LearnBayes package "tri.plot.

Set-up

Tools

Librarys that I used

library(tidyr)       # do deal with date
library(HDInterval)  # for high density interval
library(LearnBayes)  # for plotting prior, likelihood & posterior

For plotting, I created a little wrapper function about “tri.plot” that puts the prior, likelihood and posterior all in one figure

BetaBinomial.tri <- function(y, n , shape.1, shape.2){
    triplot(c(shape.1,shape.2),c(y,(n-y)))
}

If you want to have separate figures for the prior, likellihood and posterior you can use

beta.binomial <- function(y,n,shape.1,shape.2){
    N<- 100000                                        # how many p's to check
    p <-  seq(from=0,to=1,length.out=N)    # possible values for probability of sucess
    likelihood <- dbinom(y, n, prob=p)           # binomial likelihood

    layout(matrix(c(1,1,0,0,1,1,3,3,2,2,3,3,2,2,0,0),4,4,byrow=FALSE))

    # Beta prior
    prior.beta <- dbeta(p,shape.1,shape.2)              # beta(shape.1,shape.2) prior
    posterior.beta <- prior.beta*likelihood             # posterior w/ beta(shape.1,shape.2) prior
    posterior.beta <- posterior.beta/sum(posterior.beta)

    plot(p,prior.beta, type="l", main=paste("Beta(",shape.1,",",shape.2,") Prior")     )
    plot(p,likelihood, type="l", main=paste("Binomial Likelihood, y=", y, "& n=", n)   )
    plot(p,posterior.beta, type="l", main=paste("Posterior with Beta(",shape.1,",",shape.2,") Prior") )
}

Data

There are two data sets that are on the web-site. One contains information on number of cases and the other number of deaths: “cases.csv” and “death.csv”. We will read in the data and change the way date is recorded and merge the two data sets.

Number of Cases of COVID:

cases.us <- read.csv("D:/Dropbox/edps 590BAY/Lectures/2 Proportion/covid_data_charts/cases.csv", header=TRUE)
cases <- separate(data=cases.us, col="Date", into=c("month", "day","year"), sep = " ")
n1 <- nrow(cases)

Number of Deathes due to COVID:

deaths.us <- read.csv("D:/Dropbox/edps 590BAY/Lectures/2 Proportion/covid_data_charts/deaths.csv", header=TRUE)
deaths <- separate(data=deaths.us, col="Date", into=c("month", "day","year"), sep = " ")
n2 <- nrow(deaths)

Note that both data sets contain data from 528 days. Merge them matching up the dates:

covid <- merge(cases, deaths, by=c("month","day","year"))

One Proportion

February

Pull the data from February, set y, n and compute proportion of deaths

feb <- covid[which(covid$month=="Feb" & covid$year==2020),]
(nrow(feb))
## [1] 29
(n.feb <- sum(feb$New.Cases))
## [1] 1041
(y.feb <- sum(feb$New.Deaths))
## [1] 4
(p.feb <- y.feb/n.feb)
## [1] 0.003842459

The likelihood will be the binomial; however, what Beta distribution should we use for the prior. This is open to debate, but since this is the very beginning of the pandemic, I will use a uniform, Beta(1,1)

# Uniform Prior         theta ~ beta(1,1)
# Binomial likelihood   y ~ binomial(y,n)
# Posterior beta        beta(a+1,b+1)
a1 <- y.feb
b1 <- n.feb-y.feb

# posterior beta(a+1,b+1)
(a.post1 <- a1+1)
## [1] 5
(b.post1 <- b1+1)
## [1] 1038
(mu.feb <- a.post1/(a.post1 + b.post1) )          # mean posterior
## [1] 0.004793864
(mode.feb <-  (a.post1-1)/(a.post1 + b.post1-2))  # mode posterior
## [1] 0.003842459

To get an idea of the range of possible values, we compute 95% credible intervals.

# 95% credible interval using quantiles from posterior
(ci.q <- qbeta(c(.025, .975),a.post1, b.post1))
## [1] 0.001559829 0.009799376

However the postierior or probably pretty skewed so maybe it would be better to use high density intervals.

# 95% high density interval
intheta <- rbeta(1E5, a.post1, b.post1)
(ci.hd <- hdi(intheta,credMass= .95))
##       lower       upper 
## 0.001186378 0.009051350 
## attr(,"credMass")
## [1] 0.95
#  or 2nd way to use hdi
intheta <- seq(.0001, 1, length.out=10000)
(ci.hd <- hdi(qbeta(intheta,a.post1 ,b.post1), credMass= .95))
##       lower       upper 
## 0.001162459 0.009029134 
## attr(,"credMass")
## [1] 0.95

To look a the posterior, likelihood and prior all in one plot

BetaBinomial.tri(y.feb, n.feb, 1, 1)

The Likelihood is the same as the posterior.

Let’s look at these in 3 separate plots

beta.binomial(y.feb,n.feb, 1, 1)

March

As more data becomes available, we can up-date our estimate of the probability of death.

march <- covid[which(covid$month=="Mar" & covid$year==2020),]
(nrow(march))
## [1] 31
(n.mar <- sum(march$New.Cases))
## [1] 226920
(y.mar <- sum(march$New.Deaths))
## [1] 4982
(p.mar <- y.mar/n.mar)
## [1] 0.02195487

We know have some information, and we can use February’s posterior as our prior.

# Make Feb post march's priors
a <- a.post1
b <- b.post1

# Prior:      beta(a,b)
# Binomial:   binomial(y.mar, n.mar)
# Posterior:  beta(a + y.mar), (b + n.mar-y.mar))

(a.post2 <- a + y.mar)
## [1] 4987
(b.post2 <- b + n.mar - y.mar )
## [1] 222976
(mu.march <- a.post2/(a.post2 + b.post2))
## [1] 0.02187636
(mode.march <- (a.post2-1)/(a.post2 + b.post2 - 2) )
## [1] 0.02187216
# 95% high density interval
intheta <- rbeta(1E5, a.post2, b.post2)
(ci.hd <- hdi(intheta,credMass= .95))
##      lower      upper 
## 0.02128359 0.02247945 
## attr(,"credMass")
## [1] 0.95
BetaBinomial.tri(y.mar, n.mar, a.post1, b.post1)

April

Follow the steps above and find out what the up-dated probability of death is.

May

More data becomes available, what is now the posterior?

Two Proportions

February & March

We may wonder whether the probability of death changed between February and March. We will use Monte Carlo to investigate this.

We will use uniform priors and we already have the posterior for Febrary with these. So we’ll find the posterior for March using uniform prior

March

march <- covid[which(covid$month=="Mar" & covid$year==2020),]
(nrow(march))
## [1] 31
(n.mar <- sum(march$New.Cases))
## [1] 226920
(y.mar <- sum(march$New.Deaths))
## [1] 4982
(p.mar <- y.mar/n.mar)
## [1] 0.02195487

We will need the posterior for each month. We will February’s posterior as the prior.

March:

# Uniform priors
a <- 1
b <- 1

# Prior:      beta(a,b)
# Binomial:   binomial(y.mar, n.mar)
# Posterior:  beta(a + y.mar), (b + n.mar-y.mar))

(a.post2 <- a + y.mar)
## [1] 4983
(b.post2 <- b + n.mar - y.mar )
## [1] 221939
(mu.march <- a.post2/(a.post2 + b.post2))
## [1] 0.02195909
(mode.march <- (a.post2-1)/(a.post2 + b.post2 - 2) )
## [1] 0.02195487

The Monte-Carlo

S <- 1e+5
theta.feb <- rbeta(S,a.post1, b.post1)
theta.mar <- rbeta(S, a.post2, b.post2)

summary(theta.feb)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000237 0.003236 0.004475 0.004799 0.006019 0.020585
hist(theta.feb)

summary(theta.mar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02061 0.02175 0.02196 0.02196 0.02217 0.02339
hist(theta.mar)

plot(theta.feb, theta.mar, xlim=c(0,0.025), ylim=c(0,0.025))
lines(c(0, 0.025), c(0, 0.025), col="blue")

The Results

# Find proportion/probability of times that theta1 > theta2
mean(theta.mar>theta.feb)
## [1] 1

You can play with changing priors for March, but the results will likely stay the same.

March versus Apri

Check to see if probability changed.