What do you do when there is uncertainity about the model?

A short explanation starts on page 2 of https://cran.r-project.org/doc/Rnews/Rnews_2005-2.pdf and it talks specifically abou the bma package.

A very accessible and entertaining paper Hinne M, Gronau QF, van den Bergh D, Wagenmakers E-J. A Conceptual Introduction to Bayesian Model Averaging. Advances in Methods and Practices in Psychological Science. 2020;3(2):200-215. doi:10.1177/2515245919898657. I saw this as a presentation a couple of year ago…“no demon every completely vanishes”.

The comments below are limited and documentation should be consulted!

NELs

library(BMA)    # To do the model averaging

## Loading required package: survival

## Loading required package: leaps

## Loading required package: robustbase

## 
## Attaching package: 'robustbase'

## The following object is masked from 'package:survival':
## 
##     heart

## Loading required package: inline

## Loading required package: rrcov

## Scalable Robust Estimators with High Breakdown Point (version 1.6-0)

setwd("C:/Users/cja/Dropbox/edps 590BAY/Lectures/7 Multiple Linear Regression")

df <- read.table("nels_school_62821.txt", header=TRUE)
names(df)

##  [1] "schid"    "studid"   "sex"      "race"     "homework" "schtyp"  
##  [7] "ses"      "paredu"   "math"     "clastruc" "size"     "urban"   
## [13] "geograph" "perminor" "rati0"

head(df)

##   schid studid sex race homework schtyp  ses paredu math clastruc size urban
## 1 62821      0   1    4        4      4 1.63      6   62        3    3     1
## 2 62821      1   2    4        5      4 0.48      4   68        3    3     1
## 3 62821      3   1    4        5      4 0.69      5   56        3    3     1
## 4 62821      4   2    4        5      4 0.60      4   68        3    3     1
## 5 62821      6   2    4        2      4 0.77      4   61        3    3     1
## 6 62821      7   2    4        2      4 1.19      5   61        3    3     1
##   geograph perminor rati0
## 1        2        3    10
## 2        2        3    10
## 3        2        3    10
## 4        2        3    10
## 5        2        3    10
## 6        2        3    10

y <- df$math
x <- subset(df, select=c(sex, homework, ses, paredu))

Example using linear regression (glms and other models are possible.

nels.bma <- bicreg(x,y)
summary(nels.bma)

## 
## Call:
## bicreg(x = x, y = y)
## 
## 
##   9  models were selected
##  Best  5  models (cumulative posterior probability =  0.8482 ): 
## 
##            p!=0    EV      SD      model 1  model 2  model 3  model 4  model 5
## Intercept  100.0  64.6433  5.1476  66.8935  59.2102  69.4771  65.3122  60.6311
## sex         14.0   0.1688  0.6443      .        .        .     1.1872      .  
## homework    91.5   0.9984  0.4722   1.0930   1.0946   1.0642   1.1005   1.1079
## ses         18.6   0.3509  1.4280      .        .     2.9107      .    -1.4021
## paredu      64.0  -1.1668  1.1521  -1.5636      .    -2.6896  -1.6002      .  
##                                                                               
## nVar                                  2        1        3        3        2   
## r2                                  0.177    0.110    0.198    0.188    0.123 
## BIC                                -4.6072  -3.6400  -2.1317  -1.3035  -0.4079
## post prob                           0.382    0.235    0.111    0.073    0.047

Model plots: 1st color=positive coefficient, 2nd color=negative, 3rd color=not included in model

imageplot.bma(nels.bma, color=c("blue","red","grey"))

Density plots: Displays plots of the posterior distributions of the coefficients generated by Bayesian model averaging over linear regressions.

The posterior distribution of coefficients (model averaged) where spike at 0 shows posterior probability that the variable is NOT in the model.

plot(nels.bma)

Getting what you paid for

Another example….the data set we’ve been looking at all semester.

get <- read.table("getting_what_paid_for_data.txt",header=TRUE)

names(get)

##  [1] "state"     "exp_pp"    "ave_pt"    "salary"    "taking"    "ave_v"    
##  [7] "ave_m"     "ave_tot"   "region"    "state_abv"

y <- get$ave_tot
x <- subset(get, select=c("exp_pp","ave_pt", "salary","taking"))

get.bma <- bicreg(x,y)
summary(get.bma)

## 
## Call:
## bicreg(x = x, y = y)
## 
## 
##   7  models were selected
##  Best  5  models (cumulative posterior probability =  0.9073 ): 
## 
##            p!=0    EV        SD       model 1    model 2    model 3  
## Intercept  100.0  1014.7237  45.4368   993.8317  1057.8982  1035.4739
## exp_pp      69.5     8.4709   6.8343    12.2865      .        11.0140
## ave_pt      28.2    -0.9785   2.0457      .        -4.6394    -2.0282
## salary      28.6     0.4900   1.1956      .         2.5525      .    
## taking     100.0    -2.8205   0.2407    -2.8509    -2.9134    -2.8491
##                                                                      
## nVar                                       2          3          3   
## r2                                       0.819      0.824      0.823 
## BIC                                    -77.7689   -75.0877   -74.7680
## post prob                                0.508      0.133      0.113 
##            model 4    model 5  
## Intercept   987.9005   998.0292
## exp_pp         .        13.3326
## ave_pt         .          .    
## salary        2.1804    -0.3087
## taking       -2.7787    -2.8402
##                                
## nVar            2          3   
## r2            0.806      0.820 
## BIC         -74.0550   -73.8956
## post prob     0.079      0.073

# Model plots
imageplot.bma(get.bma, color=c("blue","red","grey"))

# Density plots
plot(get.bma)

Bayesian Model Averaging: NELS and SAT

Carolyn J. Anderson

2022-09-29

NELs

Getting what you paid for