What do you do when there is uncertainity about the model?
A short explanation starts on page 2 of https://cran.r-project.org/doc/Rnews/Rnews_2005-2.pdf and it talks specifically abou the bma package.
A very accessible and entertaining paper Hinne M, Gronau QF, van den Bergh D, Wagenmakers E-J. A Conceptual Introduction to Bayesian Model Averaging. Advances in Methods and Practices in Psychological Science. 2020;3(2):200-215. doi:10.1177/2515245919898657. I saw this as a presentation a couple of year ago…“no demon every completely vanishes”.
The comments below are limited and documentation should be consulted!
library(BMA) # To do the model averaging
## Loading required package: survival
## Loading required package: leaps
## Loading required package: robustbase
##
## Attaching package: 'robustbase'
## The following object is masked from 'package:survival':
##
## heart
## Loading required package: inline
## Loading required package: rrcov
## Scalable Robust Estimators with High Breakdown Point (version 1.6-0)
setwd("C:/Users/cja/Dropbox/edps 590BAY/Lectures/7 Multiple Linear Regression")
df <- read.table("nels_school_62821.txt", header=TRUE)
names(df)
## [1] "schid" "studid" "sex" "race" "homework" "schtyp"
## [7] "ses" "paredu" "math" "clastruc" "size" "urban"
## [13] "geograph" "perminor" "rati0"
head(df)
## schid studid sex race homework schtyp ses paredu math clastruc size urban
## 1 62821 0 1 4 4 4 1.63 6 62 3 3 1
## 2 62821 1 2 4 5 4 0.48 4 68 3 3 1
## 3 62821 3 1 4 5 4 0.69 5 56 3 3 1
## 4 62821 4 2 4 5 4 0.60 4 68 3 3 1
## 5 62821 6 2 4 2 4 0.77 4 61 3 3 1
## 6 62821 7 2 4 2 4 1.19 5 61 3 3 1
## geograph perminor rati0
## 1 2 3 10
## 2 2 3 10
## 3 2 3 10
## 4 2 3 10
## 5 2 3 10
## 6 2 3 10
y <- df$math
x <- subset(df, select=c(sex, homework, ses, paredu))
Example using linear regression (glms and other models are possible.
nels.bma <- bicreg(x,y)
summary(nels.bma)
##
## Call:
## bicreg(x = x, y = y)
##
##
## 9 models were selected
## Best 5 models (cumulative posterior probability = 0.8482 ):
##
## p!=0 EV SD model 1 model 2 model 3 model 4 model 5
## Intercept 100.0 64.6433 5.1476 66.8935 59.2102 69.4771 65.3122 60.6311
## sex 14.0 0.1688 0.6443 . . . 1.1872 .
## homework 91.5 0.9984 0.4722 1.0930 1.0946 1.0642 1.1005 1.1079
## ses 18.6 0.3509 1.4280 . . 2.9107 . -1.4021
## paredu 64.0 -1.1668 1.1521 -1.5636 . -2.6896 -1.6002 .
##
## nVar 2 1 3 3 2
## r2 0.177 0.110 0.198 0.188 0.123
## BIC -4.6072 -3.6400 -2.1317 -1.3035 -0.4079
## post prob 0.382 0.235 0.111 0.073 0.047
Model plots: 1st color=positive coefficient, 2nd color=negative, 3rd color=not included in model
imageplot.bma(nels.bma, color=c("blue","red","grey"))
Density plots: Displays plots of the posterior distributions of the coefficients generated by Bayesian model averaging over linear regressions.
The posterior distribution of coefficients (model averaged) where spike at 0 shows posterior probability that the variable is NOT in the model.
plot(nels.bma)
Another example….the data set we’ve been looking at all semester.
get <- read.table("getting_what_paid_for_data.txt",header=TRUE)
names(get)
## [1] "state" "exp_pp" "ave_pt" "salary" "taking" "ave_v"
## [7] "ave_m" "ave_tot" "region" "state_abv"
y <- get$ave_tot
x <- subset(get, select=c("exp_pp","ave_pt", "salary","taking"))
get.bma <- bicreg(x,y)
summary(get.bma)
##
## Call:
## bicreg(x = x, y = y)
##
##
## 7 models were selected
## Best 5 models (cumulative posterior probability = 0.9073 ):
##
## p!=0 EV SD model 1 model 2 model 3
## Intercept 100.0 1014.7237 45.4368 993.8317 1057.8982 1035.4739
## exp_pp 69.5 8.4709 6.8343 12.2865 . 11.0140
## ave_pt 28.2 -0.9785 2.0457 . -4.6394 -2.0282
## salary 28.6 0.4900 1.1956 . 2.5525 .
## taking 100.0 -2.8205 0.2407 -2.8509 -2.9134 -2.8491
##
## nVar 2 3 3
## r2 0.819 0.824 0.823
## BIC -77.7689 -75.0877 -74.7680
## post prob 0.508 0.133 0.113
## model 4 model 5
## Intercept 987.9005 998.0292
## exp_pp . 13.3326
## ave_pt . .
## salary 2.1804 -0.3087
## taking -2.7787 -2.8402
##
## nVar 2 3
## r2 0.806 0.820
## BIC -74.0550 -73.8956
## post prob 0.079 0.073
# Model plots
imageplot.bma(get.bma, color=c("blue","red","grey"))
# Density plots
plot(get.bma)