Last active
March 6, 2025 14:31
-
-
Save richarddmorey/911c383ee50dcbe90ef65951b518c7e1 to your computer and use it in GitHub Desktop.
ONS nimble analysis 2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#### Solution | |
We first compute the two variances we need to compute the Bayesian ICC. Remember that we have two parameters that represent standard deviations, so we can use these to compute the variances. | |
```{r} | |
## Bayesian ICC | |
sigma2.eps = samples[,'sigma.eps']^2 | |
sigma2.alpha = samples[,'sigma.alpha']^2 | |
``` | |
Remember that `sigma2.eps` and `sigma2.alpha` are both vectors of values; they are MCMC chains. | |
We can use the formula for the ICC to compute a new vector (MCMC chain) of values that represent posterior samples of the ICC. | |
```{r} | |
## Create a chain for the Bayesian estimate of the ICC | |
bayes.icc = sigma2.alpha / (sigma2.alpha + sigma2.eps) | |
## Convert the vector to an mcmc object for convenience | |
bayes.icc = mcmc(bayes.icc) | |
``` | |
Now that we have a vector of values, it is easy to compute posterior quantities of interest. | |
```{r} | |
summary(bayes.icc) | |
``` | |
We can also plot the posterior of the Bayesian ICC and add the classical ICC for comparison. | |
```{r tidy = FALSE} | |
## Plot the posterior | |
hist(bayes.icc, col="lightgray", | |
border = "darkgray", freq = FALSE, | |
xlab = "Bayesian ICC") | |
## Add the classical value to the plot, in green | |
abline(v = multilevel::ICC1(classical), col = "darkgreen", lwd=2) | |
text(multilevel::ICC1(classical), par()$usr[3], | |
"Classical ICC", adj = c(-.2,1.2), srt=90, | |
col = "darkgreen", cex = .8) | |
``` | |
Notice how easy it was to obtain a credible interval for the Bayesian ICC. For many quantities, it will be quite difficult to obtain a classical confidence interval, yet a Bayesian credible interval will be easily obtainable from the MCMC chain. | |
#### Another perspective | |
Another (complementary) way we can view the problem is to look at the estimates of the $\alpha$ parameters. | |
Recall that `alphas` contains our chains for the alpha parameters and `alpha.est` contains the posterior means. We now compute the 95% credible intervals for each $\alpha$. | |
```{r} | |
## Use apply() to compute central 95% credible intervals | |
CIs = apply(alphas, 2, function(v) quantile(v, p = c(.025, .975))) | |
``` | |
We want to plot these estimates and CIs in a way that will enable us to visualize variance of the schools. A cumulative distribution plot, with error bars, is one good way of doing this. | |
```{r tidy = FALSE} | |
## We need to reorder the estimates and CIs together, so we define a | |
## variable to keep track of the ordering | |
ord = order(alpha.est) | |
## Plot the school means, in order from worst to best (using ord) | |
plot(alpha.est[ord], 1:length(alpha.est), | |
xlim = range(CIs), | |
pch = 19, ylab = "School (ordered)", | |
xlab = "Hierarchical school mean") | |
## Add the corresponding credible intervals | |
segments(CIs[1,ord], 1:length(alpha.est), | |
CIs[2,ord], 1:length(alpha.est), | |
col = rgb(0,0,0,.2)) | |
``` | |
As can be seen, the variability of the schools is substantial compared to the credible intervals. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#### Solution | |
We first gather the observed means and the point estimates of the $\alpha$ parameters, the hierarchical means of the schools. | |
```{r} | |
## Use aggregate to compute the observed mean math score per school | |
observed.means = aggregate(math ~ school, data = data1, mean) | |
## Compute the posterior means of the alpha parameters | |
## to use as point estimates | |
alphas = samples[,1:160] | |
alpha.est = colMeans(alphas) | |
``` | |
We can now plot the two estimates against one another. | |
```{r tidy = FALSE} | |
plot(observed.means$math, alpha.est, | |
ylab = "Hierarchical estimate", | |
xlab = "Observed school mean", | |
col = rgb(0,0,0,.3), pch = 19) | |
abline(0,1, lty = 2, col="red") | |
``` | |
Notice the moderate hierarchical shrinkage. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Bayesian Statistics: A practical introduction" | |
subtitle: Lab 2 | |
author: "Richard D. Morey, Cardiff University (<a href='mailto:[email protected]'>[email protected]</a>)" | |
framework: bootstrap | |
mode: selfcontained | |
highlighter: prettify | |
hitheme: twitter-bootstrap | |
output: | |
html_document: | |
toc: true | |
toc_float: false | |
assets: | |
css: | |
- "http://fonts.googleapis.com/css?family=Raleway:300" | |
- "http://fonts.googleapis.com/css?family=Oxygen" | |
always_allow_html: true | |
editor_options: | |
chunk_output_type: console | |
--- | |
<style> | |
body{ | |
font-family: 'Oxygen', sans-serif; | |
font-size: 16px; | |
line-height: 24px; | |
} | |
h1,h2,h3,h4 { | |
font-family: 'Raleway', sans-serif; | |
} | |
.container { width: 1000px; } | |
h3 { | |
background-color: #D4DAEC; | |
text-indent: 50px; | |
} | |
h4 { | |
text-indent: 100px; | |
} | |
g-table-intro h4 { | |
text-indent: 0px; | |
} | |
* { | |
margin: 0; | |
padding: 0; | |
box-sizing: border-box; | |
} | |
table { | |
text-align: left; | |
line-height: 40px; | |
border-collapse: separate; | |
border-spacing: 0; | |
border: 2px solid #ed1c40; | |
width: 500px; | |
margin: 50px auto; | |
border-radius: .25rem; | |
} | |
thead tr:first-child { | |
background: #ed1c40; | |
color: #fff; | |
border: none; | |
} | |
th:first-child, | |
td:first-child { | |
padding: 0 15px 0 20px; | |
} | |
thead tr:last-child th { | |
border-bottom: 3px solid #ddd; | |
} | |
tbody tr:hover { | |
background-color: rgba(237, 28, 64, .1); | |
cursor: default; | |
} | |
tbody tr:last-child td { | |
border: none; | |
} | |
tbody td { | |
border-bottom: 1px solid #ddd; | |
} | |
td:last-child { | |
text-align: left; | |
padding-right: 10px; | |
} | |
</style> | |
Each example contains R code for you to run by cutting and pasting the code from the document into R. The output should look similar to the output in the document below. | |
Before you start, you should change your R working directory to whatever directory you have stored this lab file in. This can be done through the `File...` menu in Windows, or in all platforms using the `setwd()` function. | |
Hierarchical modeling | |
----------------- | |
### Setup | |
See Jackman (2009) offers the following Example 7.6 (page 323). | |
> ANOVA. The 1982 High School and Beyond Survey is a nationally representative | |
sample of US public and Catholic schools, covering 7185 students in 160 schools. The | |
chief outcome of interest is a standardized measure of math ability, with a mean of | |
12.75 and IQR of [7.28, 18.32]. These data figure prominently in the standard reference | |
on hierarchical linear models (Raudenbush and Bryk 2002), and so are well suited for | |
our purposes here. | |
> We fit a one-way ANOVA model (Equation 7.1 in Example 7.1), recognizing that | |
students are grouped by school. We momentarily ignore school type (public vs Catholic) | |
and other school-level characteristics... | |
The code to run the analysis is in `oneWay.R`. Remember to change R's working directory, and to load the `nimble` library. See the BUGS code on `oneWay2.bug`, and make sure you understand the model. | |
### Data preparation | |
First we prepare packages and read in the data. | |
```{r} | |
library(multilevel) | |
library(nimble) | |
# Read in the data using the code in the dataPrep.R | |
source('dataPrep.R') | |
``` | |
We can now examine the structure of the data. Each row is a student. | |
```{r echo=TRUE} | |
DT::datatable(data1, options = list(pageLength = 5)) | |
``` | |
We also have access to school level covariates in the data frame `data2`. | |
Math scores go from 0 to 25, as we can see by looking at a summary of the scores in the `math` column. | |
```{r} | |
## Summarise the column of interest | |
summary(data1$math) | |
``` | |
Now we can look at the counts of the students and schools in the data set. | |
```{r} | |
## How many students in each school are there? | |
table(data1$school) | |
## How many schools? | |
length(unique(data1$school)) | |
``` | |
To see if it will be important to include school as a covariate, we compute the intra-class correlation to see if `school` adds a substantial amount of variance. In this case, the ICC is an estimate of | |
\[ | |
\frac{\sigma^2_{\mbox{school}}}{\sigma^2_{\mbox{school}} + \sigma^2_\epsilon} | |
\] | |
The ICC measures the relative contribution of the random schools and random error to the variance in the data. | |
```{r} | |
## First, perform a classical one-way ANOVA | |
classical <- aov(math ~ as.factor(school), | |
data = data1) | |
summary(classical) | |
## intra-class correlation | |
multilevel::ICC1(classical) | |
``` | |
It seems that the intraclass correlation coefficient is moderately high. We should include school in our model. | |
### Part 1: Fitting the hierarchical model | |
We will fit the hierarchical model built in `oneWay2.bug`, listed below: | |
``` | |
```{r echo=FALSE, results='asis'} | |
cat(paste(readLines('oneway2.bug'), collapse = "\n")) | |
``` | |
``` | |
What are the variables? | |
Variable name | Type | Length | Description | |
--------------|------------|--------|-------------- | |
`math` | data | Number of students | the math score of the `i`th student | |
`Nstudents` | data | 1 | Total number of students | |
`alpha` | parameter | Number of schools | the effect of being in `p`th school | |
`j` | index | Number of students | an index telling us which school the `i`th student attends | |
`tau.eps` | parameter | 1 | Precision (inverse of variance) of math scores within schools | |
`J` | data | 1 | Total number of schools | |
`mu0` | parameter | 1 | Grand mean math performance of across schools | |
`tau.alpha` | parameter | 1 | Precision of mean math scores across schools | |
`sigma.eps` | parameter | 1 | Standard deviation of math scores within schools | |
`sigma.alpha` | parameter | 1 | Standard deviation of mean math scores across schools | |
```{r results = 'hide'} | |
## Create the data to pass to Nimble | |
forNimble <- list() | |
forNimble$math <- data1$math | |
forNimble$Nstudents = length(data1$math) | |
forNimble$j <- match(data1$school, ## indexes which school we're in | |
unique(data1$school)) | |
forNimble$J <- max(forNimble$j) ## number of schools | |
## initialize the Gibbs sampler with starting values | |
## this is not necessary most of the time. | |
inits <- list(mu0=mean(forNimble$math), | |
alpha=tapply(forNimble$math, | |
forNimble$j,mean) ## school-specific means | |
) | |
nimble.model <- nimble::readBUGSmodel( | |
model="oneway2.bug", | |
data=forNimble, | |
inits=inits) | |
## get 4k iterations | |
compiled.model = compileNimble(nimble.model) | |
conf <- configureMCMC(nimble.model) | |
conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha")) | |
# Build the MCMC | |
Rmcmc <- buildMCMC(conf) | |
Cmcmc <- compileNimble(Rmcmc, project = compiled.model) | |
runMCMC(Cmcmc, niter = 4000) |> | |
as.mcmc() -> samples | |
``` | |
Let's first check one of our convergence statistics. | |
```{r eval = FALSE} | |
## Geweke diagnostic | |
gd = geweke.diag(samples) | |
## Extract the z scores from the object | |
geweke.z.scores = gd$z | |
## plot the cumulative distribution of the z scores | |
plot(ecdf(geweke.z.scores), xlab = "Geweke diagnostic z score") | |
## Add the theoretical line (standard normal) | |
curve( pnorm(x), xlim = par()$usr[1:2], add = TRUE, lwd = 2, col="red") | |
``` | |
We now make a PDF file containing plots of all the chains. We make a PDF because it is easier to page through a PDF than a lot of R plots. | |
```{r eval = FALSE} | |
## see summaries for all of the output | |
summary(samples) | |
## Dump a PDF with all chains | |
pdf('chains.pdf', version = "1.4") | |
plot(samples) | |
dev.off() | |
``` | |
Compare the hierarchical estimates of the school means with the nonhierarchical school means (use `aggregate()` or `tapply()` to find the group means). What do you expect to see? Is this what you find? | |
```{r child='solution_R/exercise_1/compare_hier.Rmd', include = solutions, eval = solutions} | |
``` | |
### Part 2: Assessing the variance in schools | |
Compute a Bayesian posterior for the intraclass correlation coefficient using the values from the MCMC chain. | |
1. Use reparametrization to compute the two variances of interest. | |
2. Compute a chain of ICC values using the definition of the ICC above. | |
3. Compute the posterior mean and 95% credible interval. | |
4. Plot the posterior. | |
Does the Bayesian ICC agree with the classical ICC? | |
```{r child='solution_R/exercise_2/bayes_ICC.Rmd', include = solutions, eval = solutions} | |
``` | |
### Part 3: Adding a school-level covariate | |
Choose a school-level covariate to add to the model. The school-level covariates are contained in the data frame `data2`, which has the following structure. Each row is a school. | |
```{r echo=FALSE} | |
DT::datatable(data2, options = list(pageLength = 5)) | |
``` | |
(The solution uses school size.) | |
```{r child='solution_R/exercise_3/school_size.Rmd', include = solutions, eval = solutions} | |
``` | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
model{ | |
## loop over the student-level data | |
for(i in 1:Nstudents){ | |
## j is a variable (nested indexing) | |
math[i] ~ dnorm(alpha[j[i]], tau.eps) | |
} | |
## loop over the J schools (level 2) | |
for(p in 1:J){ | |
alpha[p] ~ dnorm(mu0, tau.alpha) | |
} | |
## priors on hyperparameters | |
mu0 ~ dnorm(0, .0001) | |
## uniform priors on standard deviations | |
sigma.eps ~ dunif(0, 10) | |
sigma.alpha ~ dunif(0, 10) | |
## convert the standard deviations to precisions | |
tau.eps <- pow(sigma.eps, -2) | |
tau.alpha <- pow(sigma.alpha, -2) | |
} | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
model{ | |
## loop over the student-level data | |
for(i in 1:Nstudents){ | |
## j is a variable (nested indexing) | |
math[i] ~ dnorm(alpha[j[i]], tau.eps) | |
} | |
## loop over the J schools (level 2) | |
for(p in 1:J){ | |
alpha.pred[p] <- mu0 + beta.size * size[p] | |
alpha[p] ~ dnorm(alpha.pred[p] , tau.alpha) | |
} | |
## priors on hyperparameters | |
mu0 ~ dnorm(0, .0001) | |
beta.size ~ dnorm(0, .0001) | |
## uniform priors on standard deviations | |
sigma.eps ~ dunif(0, 10) | |
sigma.alpha ~ dunif(0, 10) | |
## convert the standard deviations to precisions | |
tau.eps <- pow(sigma.eps, -2) | |
tau.alpha <- pow(sigma.alpha, -2) | |
} | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#### Solution | |
I chose to add in the size of the school as a covariate. It makes sense to "center" the school and create a "school size ratio", of which we will take the logarithm (base 2). This means the slope can be interpreted as the change in mean math score for every doubling of the | |
school size. | |
This answers the question of *whether school size has an effect on math scores.* | |
Our school-level covariates are in `data2`. We construct our centered school size covariate called `size`... | |
```{r} | |
size = log2(data2$size) - mean(log2(data2$size)) | |
``` | |
...and then include it in the object we will pass to JAGS: | |
```{r} | |
## Create the data to pass to JAGS | |
forNimble <- list() | |
forNimble$math <- data1$math | |
forNimble$Nstudents = length(data1$math) | |
forNimble$j <- match(data1$school, ## indexes which school we're in | |
unique(data1$school)) | |
forNimble$J <- max(forNimble$j) ## number of schools | |
forNimble$size <- size | |
``` | |
The BUGS file is shown below. | |
``` | |
```{r echo=FALSE, results = 'asis'} | |
cat(paste(readLines("../../solution_bug/oneway3.bug"),collapse = "\n")) | |
``` | |
``` | |
We now have all the information we need to fit the model. Note that we need to include the new parameter `beta.size` in the initialization object. | |
```{r} | |
## initialize the Gibbs sampler with starting values | |
## this is not necessary most of the time. | |
inits <- list(mu0 = mean(forNimble$math), | |
alpha = tapply(forNimble$math, | |
forNimble$j,mean), ## school-specific means | |
beta.size = 0) | |
``` | |
We compile and sample from the model. | |
```{r eval = FALSE} | |
## compile Nimble model | |
nimble.model <- nimble::readBUGSmodel( | |
model="solution_bug/oneway3.bug", | |
data=forNimble, | |
inits=inits) | |
``` | |
```{r echo=FALSE} | |
## compile Nimble model | |
nimble.model <- nimble::readBUGSmodel( | |
model="../../solution_bug/oneway3.bug", | |
data=forNimble, | |
inits=inits) | |
``` | |
```{r results = 'hide'} | |
## get 4k iterations | |
compiled.model = compileNimble(nimble.model) | |
conf <- configureMCMC(nimble.model) | |
conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha", "beta.size")) | |
# Build the MCMC | |
Rmcmc <- buildMCMC(conf) | |
Cmcmc <- compileNimble(Rmcmc, project = compiled.model) | |
runMCMC(Cmcmc, niter = 4000) |> | |
as.mcmc() -> samples | |
``` | |
Note that we now include `beta.size` as a parameter to monitor. | |
We can now plot the posterior distribution for the `beta.size` parameter... | |
```{r} | |
summary(samples[,"beta.size"]) | |
``` | |
...and plot the marginal posterior: | |
```{r} | |
plot(samples[,"beta.size"]) | |
``` | |
Recall that we estimated the observed school means in part 1, so we can use this to check our answer. We will plot the observed means against the school size variable, and then add a regression line based on the Bayesian estimates. | |
```{r} | |
plot(size, observed.means$math, ylab = "Math score", xlab = "(Centered) log size", pch = 19, col = rgb(0,0,0,.3)) | |
## Compute the Bayesian point estimates | |
intercept = mean(samples[,"mu0"]) | |
slope = mean(samples[,"beta.size"]) | |
## classical (non-hierarchical) regression line in red | |
abline(lm(observed.means$math~size), col = "red", lwd=2, lty=2) | |
## Bayesian regression line in blue | |
abline(intercept, slope, col = "blue", lwd=2, lty=1) | |
``` | |
The results agree. The difference between the two is that the hierarchical model can estimate "proper" standard errors, accounting for uncertainty at all levels (student and school) whereas the classical estimates are based on aggregation across students, and hence cannot take into account the uncertainty at that level. | |
```{r} | |
## Posterior standard deviation of the intercept | |
sd(samples[,"mu0"]) | |
## Posterior standard deviation of beta | |
sd(samples[,"beta.size"]) | |
## Classical (non-hierarchical) regression | |
summary(lm(observed.means$math~size)) | |
``` | |
Though, to be fair, the results are nearly identical in this particular case owing to the moderately-high, and roughly equal, number of observations in each school. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment