Skip to content

Instantly share code, notes, and snippets.

@richarddmorey
Last active March 6, 2025 14:31
Show Gist options
  • Save richarddmorey/911c383ee50dcbe90ef65951b518c7e1 to your computer and use it in GitHub Desktop.
Save richarddmorey/911c383ee50dcbe90ef65951b518c7e1 to your computer and use it in GitHub Desktop.
ONS nimble analysis 2
#### Solution
We first compute the two variances we need to compute the Bayesian ICC. Remember that we have two parameters that represent standard deviations, so we can use these to compute the variances.
```{r}
## Bayesian ICC
sigma2.eps = samples[,'sigma.eps']^2
sigma2.alpha = samples[,'sigma.alpha']^2
```
Remember that `sigma2.eps` and `sigma2.alpha` are both vectors of values; they are MCMC chains.
We can use the formula for the ICC to compute a new vector (MCMC chain) of values that represent posterior samples of the ICC.
```{r}
## Create a chain for the Bayesian estimate of the ICC
bayes.icc = sigma2.alpha / (sigma2.alpha + sigma2.eps)
## Convert the vector to an mcmc object for convenience
bayes.icc = mcmc(bayes.icc)
```
Now that we have a vector of values, it is easy to compute posterior quantities of interest.
```{r}
summary(bayes.icc)
```
We can also plot the posterior of the Bayesian ICC and add the classical ICC for comparison.
```{r tidy = FALSE}
## Plot the posterior
hist(bayes.icc, col="lightgray",
border = "darkgray", freq = FALSE,
xlab = "Bayesian ICC")
## Add the classical value to the plot, in green
abline(v = multilevel::ICC1(classical), col = "darkgreen", lwd=2)
text(multilevel::ICC1(classical), par()$usr[3],
"Classical ICC", adj = c(-.2,1.2), srt=90,
col = "darkgreen", cex = .8)
```
Notice how easy it was to obtain a credible interval for the Bayesian ICC. For many quantities, it will be quite difficult to obtain a classical confidence interval, yet a Bayesian credible interval will be easily obtainable from the MCMC chain.
#### Another perspective
Another (complementary) way we can view the problem is to look at the estimates of the $\alpha$ parameters.
Recall that `alphas` contains our chains for the alpha parameters and `alpha.est` contains the posterior means. We now compute the 95% credible intervals for each $\alpha$.
```{r}
## Use apply() to compute central 95% credible intervals
CIs = apply(alphas, 2, function(v) quantile(v, p = c(.025, .975)))
```
We want to plot these estimates and CIs in a way that will enable us to visualize variance of the schools. A cumulative distribution plot, with error bars, is one good way of doing this.
```{r tidy = FALSE}
## We need to reorder the estimates and CIs together, so we define a
## variable to keep track of the ordering
ord = order(alpha.est)
## Plot the school means, in order from worst to best (using ord)
plot(alpha.est[ord], 1:length(alpha.est),
xlim = range(CIs),
pch = 19, ylab = "School (ordered)",
xlab = "Hierarchical school mean")
## Add the corresponding credible intervals
segments(CIs[1,ord], 1:length(alpha.est),
CIs[2,ord], 1:length(alpha.est),
col = rgb(0,0,0,.2))
```
As can be seen, the variability of the schools is substantial compared to the credible intervals.
#### Solution
We first gather the observed means and the point estimates of the $\alpha$ parameters, the hierarchical means of the schools.
```{r}
## Use aggregate to compute the observed mean math score per school
observed.means = aggregate(math ~ school, data = data1, mean)
## Compute the posterior means of the alpha parameters
## to use as point estimates
alphas = samples[,1:160]
alpha.est = colMeans(alphas)
```
We can now plot the two estimates against one another.
```{r tidy = FALSE}
plot(observed.means$math, alpha.est,
ylab = "Hierarchical estimate",
xlab = "Observed school mean",
col = rgb(0,0,0,.3), pch = 19)
abline(0,1, lty = 2, col="red")
```
Notice the moderate hierarchical shrinkage.
---
title: "Bayesian Statistics: A practical introduction"
subtitle: Lab 2
author: "Richard D. Morey, Cardiff University (<a href='mailto:[email protected]'>[email protected]</a>)"
framework: bootstrap
mode: selfcontained
highlighter: prettify
hitheme: twitter-bootstrap
output:
html_document:
toc: true
toc_float: false
assets:
css:
- "http://fonts.googleapis.com/css?family=Raleway:300"
- "http://fonts.googleapis.com/css?family=Oxygen"
always_allow_html: true
editor_options:
chunk_output_type: console
---
<style>
body{
font-family: 'Oxygen', sans-serif;
font-size: 16px;
line-height: 24px;
}
h1,h2,h3,h4 {
font-family: 'Raleway', sans-serif;
}
.container { width: 1000px; }
h3 {
background-color: #D4DAEC;
text-indent: 50px;
}
h4 {
text-indent: 100px;
}
g-table-intro h4 {
text-indent: 0px;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
table {
text-align: left;
line-height: 40px;
border-collapse: separate;
border-spacing: 0;
border: 2px solid #ed1c40;
width: 500px;
margin: 50px auto;
border-radius: .25rem;
}
thead tr:first-child {
background: #ed1c40;
color: #fff;
border: none;
}
th:first-child,
td:first-child {
padding: 0 15px 0 20px;
}
thead tr:last-child th {
border-bottom: 3px solid #ddd;
}
tbody tr:hover {
background-color: rgba(237, 28, 64, .1);
cursor: default;
}
tbody tr:last-child td {
border: none;
}
tbody td {
border-bottom: 1px solid #ddd;
}
td:last-child {
text-align: left;
padding-right: 10px;
}
</style>
Each example contains R code for you to run by cutting and pasting the code from the document into R. The output should look similar to the output in the document below.
Before you start, you should change your R working directory to whatever directory you have stored this lab file in. This can be done through the `File...` menu in Windows, or in all platforms using the `setwd()` function.
Hierarchical modeling
-----------------
### Setup
See Jackman (2009) offers the following Example 7.6 (page 323).
> ANOVA. The 1982 High School and Beyond Survey is a nationally representative
sample of US public and Catholic schools, covering 7185 students in 160 schools. The
chief outcome of interest is a standardized measure of math ability, with a mean of
12.75 and IQR of [7.28, 18.32]. These data figure prominently in the standard reference
on hierarchical linear models (Raudenbush and Bryk 2002), and so are well suited for
our purposes here.
> We fit a one-way ANOVA model (Equation 7.1 in Example 7.1), recognizing that
students are grouped by school. We momentarily ignore school type (public vs Catholic)
and other school-level characteristics...
The code to run the analysis is in `oneWay.R`. Remember to change R's working directory, and to load the `nimble` library. See the BUGS code on `oneWay2.bug`, and make sure you understand the model.
### Data preparation
First we prepare packages and read in the data.
```{r}
library(multilevel)
library(nimble)
# Read in the data using the code in the dataPrep.R
source('dataPrep.R')
```
We can now examine the structure of the data. Each row is a student.
```{r echo=TRUE}
DT::datatable(data1, options = list(pageLength = 5))
```
We also have access to school level covariates in the data frame `data2`.
Math scores go from 0 to 25, as we can see by looking at a summary of the scores in the `math` column.
```{r}
## Summarise the column of interest
summary(data1$math)
```
Now we can look at the counts of the students and schools in the data set.
```{r}
## How many students in each school are there?
table(data1$school)
## How many schools?
length(unique(data1$school))
```
To see if it will be important to include school as a covariate, we compute the intra-class correlation to see if `school` adds a substantial amount of variance. In this case, the ICC is an estimate of
\[
\frac{\sigma^2_{\mbox{school}}}{\sigma^2_{\mbox{school}} + \sigma^2_\epsilon}
\]
The ICC measures the relative contribution of the random schools and random error to the variance in the data.
```{r}
## First, perform a classical one-way ANOVA
classical <- aov(math ~ as.factor(school),
data = data1)
summary(classical)
## intra-class correlation
multilevel::ICC1(classical)
```
It seems that the intraclass correlation coefficient is moderately high. We should include school in our model.
### Part 1: Fitting the hierarchical model
We will fit the hierarchical model built in `oneWay2.bug`, listed below:
```
```{r echo=FALSE, results='asis'}
cat(paste(readLines('oneway2.bug'), collapse = "\n"))
```
```
What are the variables?
Variable name | Type | Length | Description
--------------|------------|--------|--------------
`math` | data | Number of students | the math score of the `i`th student
`Nstudents` | data | 1 | Total number of students
`alpha` | parameter | Number of schools | the effect of being in `p`th school
`j` | index | Number of students | an index telling us which school the `i`th student attends
`tau.eps` | parameter | 1 | Precision (inverse of variance) of math scores within schools
`J` | data | 1 | Total number of schools
`mu0` | parameter | 1 | Grand mean math performance of across schools
`tau.alpha` | parameter | 1 | Precision of mean math scores across schools
`sigma.eps` | parameter | 1 | Standard deviation of math scores within schools
`sigma.alpha` | parameter | 1 | Standard deviation of mean math scores across schools
```{r results = 'hide'}
## Create the data to pass to Nimble
forNimble <- list()
forNimble$math <- data1$math
forNimble$Nstudents = length(data1$math)
forNimble$j <- match(data1$school, ## indexes which school we're in
unique(data1$school))
forNimble$J <- max(forNimble$j) ## number of schools
## initialize the Gibbs sampler with starting values
## this is not necessary most of the time.
inits <- list(mu0=mean(forNimble$math),
alpha=tapply(forNimble$math,
forNimble$j,mean) ## school-specific means
)
nimble.model <- nimble::readBUGSmodel(
model="oneway2.bug",
data=forNimble,
inits=inits)
## get 4k iterations
compiled.model = compileNimble(nimble.model)
conf <- configureMCMC(nimble.model)
conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha"))
# Build the MCMC
Rmcmc <- buildMCMC(conf)
Cmcmc <- compileNimble(Rmcmc, project = compiled.model)
runMCMC(Cmcmc, niter = 4000) |>
as.mcmc() -> samples
```
Let's first check one of our convergence statistics.
```{r eval = FALSE}
## Geweke diagnostic
gd = geweke.diag(samples)
## Extract the z scores from the object
geweke.z.scores = gd$z
## plot the cumulative distribution of the z scores
plot(ecdf(geweke.z.scores), xlab = "Geweke diagnostic z score")
## Add the theoretical line (standard normal)
curve( pnorm(x), xlim = par()$usr[1:2], add = TRUE, lwd = 2, col="red")
```
We now make a PDF file containing plots of all the chains. We make a PDF because it is easier to page through a PDF than a lot of R plots.
```{r eval = FALSE}
## see summaries for all of the output
summary(samples)
## Dump a PDF with all chains
pdf('chains.pdf', version = "1.4")
plot(samples)
dev.off()
```
Compare the hierarchical estimates of the school means with the nonhierarchical school means (use `aggregate()` or `tapply()` to find the group means). What do you expect to see? Is this what you find?
```{r child='solution_R/exercise_1/compare_hier.Rmd', include = solutions, eval = solutions}
```
### Part 2: Assessing the variance in schools
Compute a Bayesian posterior for the intraclass correlation coefficient using the values from the MCMC chain.
1. Use reparametrization to compute the two variances of interest.
2. Compute a chain of ICC values using the definition of the ICC above.
3. Compute the posterior mean and 95% credible interval.
4. Plot the posterior.
Does the Bayesian ICC agree with the classical ICC?
```{r child='solution_R/exercise_2/bayes_ICC.Rmd', include = solutions, eval = solutions}
```
### Part 3: Adding a school-level covariate
Choose a school-level covariate to add to the model. The school-level covariates are contained in the data frame `data2`, which has the following structure. Each row is a school.
```{r echo=FALSE}
DT::datatable(data2, options = list(pageLength = 5))
```
(The solution uses school size.)
```{r child='solution_R/exercise_3/school_size.Rmd', include = solutions, eval = solutions}
```
model{
## loop over the student-level data
for(i in 1:Nstudents){
## j is a variable (nested indexing)
math[i] ~ dnorm(alpha[j[i]], tau.eps)
}
## loop over the J schools (level 2)
for(p in 1:J){
alpha[p] ~ dnorm(mu0, tau.alpha)
}
## priors on hyperparameters
mu0 ~ dnorm(0, .0001)
## uniform priors on standard deviations
sigma.eps ~ dunif(0, 10)
sigma.alpha ~ dunif(0, 10)
## convert the standard deviations to precisions
tau.eps <- pow(sigma.eps, -2)
tau.alpha <- pow(sigma.alpha, -2)
}
model{
## loop over the student-level data
for(i in 1:Nstudents){
## j is a variable (nested indexing)
math[i] ~ dnorm(alpha[j[i]], tau.eps)
}
## loop over the J schools (level 2)
for(p in 1:J){
alpha.pred[p] <- mu0 + beta.size * size[p]
alpha[p] ~ dnorm(alpha.pred[p] , tau.alpha)
}
## priors on hyperparameters
mu0 ~ dnorm(0, .0001)
beta.size ~ dnorm(0, .0001)
## uniform priors on standard deviations
sigma.eps ~ dunif(0, 10)
sigma.alpha ~ dunif(0, 10)
## convert the standard deviations to precisions
tau.eps <- pow(sigma.eps, -2)
tau.alpha <- pow(sigma.alpha, -2)
}
#### Solution
I chose to add in the size of the school as a covariate. It makes sense to "center" the school and create a "school size ratio", of which we will take the logarithm (base 2). This means the slope can be interpreted as the change in mean math score for every doubling of the
school size.
This answers the question of *whether school size has an effect on math scores.*
Our school-level covariates are in `data2`. We construct our centered school size covariate called `size`...
```{r}
size = log2(data2$size) - mean(log2(data2$size))
```
...and then include it in the object we will pass to JAGS:
```{r}
## Create the data to pass to JAGS
forNimble <- list()
forNimble$math <- data1$math
forNimble$Nstudents = length(data1$math)
forNimble$j <- match(data1$school, ## indexes which school we're in
unique(data1$school))
forNimble$J <- max(forNimble$j) ## number of schools
forNimble$size <- size
```
The BUGS file is shown below.
```
```{r echo=FALSE, results = 'asis'}
cat(paste(readLines("../../solution_bug/oneway3.bug"),collapse = "\n"))
```
```
We now have all the information we need to fit the model. Note that we need to include the new parameter `beta.size` in the initialization object.
```{r}
## initialize the Gibbs sampler with starting values
## this is not necessary most of the time.
inits <- list(mu0 = mean(forNimble$math),
alpha = tapply(forNimble$math,
forNimble$j,mean), ## school-specific means
beta.size = 0)
```
We compile and sample from the model.
```{r eval = FALSE}
## compile Nimble model
nimble.model <- nimble::readBUGSmodel(
model="solution_bug/oneway3.bug",
data=forNimble,
inits=inits)
```
```{r echo=FALSE}
## compile Nimble model
nimble.model <- nimble::readBUGSmodel(
model="../../solution_bug/oneway3.bug",
data=forNimble,
inits=inits)
```
```{r results = 'hide'}
## get 4k iterations
compiled.model = compileNimble(nimble.model)
conf <- configureMCMC(nimble.model)
conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha", "beta.size"))
# Build the MCMC
Rmcmc <- buildMCMC(conf)
Cmcmc <- compileNimble(Rmcmc, project = compiled.model)
runMCMC(Cmcmc, niter = 4000) |>
as.mcmc() -> samples
```
Note that we now include `beta.size` as a parameter to monitor.
We can now plot the posterior distribution for the `beta.size` parameter...
```{r}
summary(samples[,"beta.size"])
```
...and plot the marginal posterior:
```{r}
plot(samples[,"beta.size"])
```
Recall that we estimated the observed school means in part 1, so we can use this to check our answer. We will plot the observed means against the school size variable, and then add a regression line based on the Bayesian estimates.
```{r}
plot(size, observed.means$math, ylab = "Math score", xlab = "(Centered) log size", pch = 19, col = rgb(0,0,0,.3))
## Compute the Bayesian point estimates
intercept = mean(samples[,"mu0"])
slope = mean(samples[,"beta.size"])
## classical (non-hierarchical) regression line in red
abline(lm(observed.means$math~size), col = "red", lwd=2, lty=2)
## Bayesian regression line in blue
abline(intercept, slope, col = "blue", lwd=2, lty=1)
```
The results agree. The difference between the two is that the hierarchical model can estimate "proper" standard errors, accounting for uncertainty at all levels (student and school) whereas the classical estimates are based on aggregation across students, and hence cannot take into account the uncertainty at that level.
```{r}
## Posterior standard deviation of the intercept
sd(samples[,"mu0"])
## Posterior standard deviation of beta
sd(samples[,"beta.size"])
## Classical (non-hierarchical) regression
summary(lm(observed.means$math~size))
```
Though, to be fair, the results are nearly identical in this particular case owing to the moderately-high, and roughly equal, number of observations in each school.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment