richarddmorey · March 6, 2025 14:31
diff --git a/bayes_ICC.Rmd b/bayes_ICC.Rmd

 #### Solution

 We first compute the two variances we need to compute the Bayesian ICC. Remember that we have two parameters that represent standard deviations, so we can use these to compute the variances.

 ```{r}
 ## Bayesian ICC
 sigma2.eps = samples[,'sigma.eps']^2  
 sigma2.alpha = samples[,'sigma.alpha']^2
 ```

 Remember that `sigma2.eps` and `sigma2.alpha` are both vectors of values; they are MCMC chains.

 We can use the formula for the ICC to compute a new vector (MCMC chain) of values that represent posterior samples of the ICC.

 ```{r}
 ## Create a chain for the Bayesian estimate of the ICC
 bayes.icc = sigma2.alpha / (sigma2.alpha + sigma2.eps)

 ## Convert the vector to an mcmc object for convenience
 bayes.icc = mcmc(bayes.icc)
 ```

 Now that we have a vector of values, it is easy to compute posterior quantities of interest.

 ```{r}
 summary(bayes.icc)
 ```

 We can also plot the posterior of the Bayesian ICC and add the classical ICC for comparison.

 ```{r tidy = FALSE}
 ## Plot the posterior
 hist(bayes.icc, col="lightgray", 
     border = "darkgray", freq = FALSE,
     xlab = "Bayesian ICC")

 ## Add the classical value to the plot, in green
 abline(v = multilevel::ICC1(classical), col = "darkgreen", lwd=2)
 text(multilevel::ICC1(classical), par()$usr[3], 
     "Classical ICC", adj = c(-.2,1.2), srt=90, 
     col = "darkgreen", cex = .8)
 ```

 Notice how easy it was to obtain a credible interval for the Bayesian ICC. For many quantities, it will be quite difficult to obtain a classical confidence interval, yet a Bayesian credible interval will be easily obtainable from the MCMC chain.

 #### Another perspective

 Another (complementary) way we can view the problem is to look at the estimates of the $\alpha$ parameters.

 Recall that `alphas` contains our chains for the alpha parameters and `alpha.est` contains the posterior means. We now compute the 95% credible intervals for each $\alpha$.

 ```{r}
 ## Use apply() to compute central 95% credible intervals
 CIs = apply(alphas, 2, function(v) quantile(v, p = c(.025, .975)))

 ```

 We want to plot these estimates and CIs in a way that will enable us to visualize variance of the schools. A cumulative distribution plot, with error bars, is one good way of doing this.

 ```{r tidy = FALSE}
 ## We need to reorder the estimates and CIs together, so we define a
 ## variable to keep track of the ordering
 ord = order(alpha.est)

 ## Plot the school means, in order from worst to best (using ord)
 plot(alpha.est[ord], 1:length(alpha.est), 
     xlim = range(CIs),
     pch = 19, ylab = "School (ordered)", 
     xlab = "Hierarchical school mean")

 ## Add the corresponding credible intervals
 segments(CIs[1,ord], 1:length(alpha.est), 
         CIs[2,ord], 1:length(alpha.est),
         col = rgb(0,0,0,.2))

 ```

 As can be seen, the variability of the schools is substantial compared to the credible intervals. 


diff --git a/compare_hier.Rmd b/compare_hier.Rmd
 #### Solution

 We first gather the observed means and the point estimates of the $\alpha$ parameters, the hierarchical means of the schools.

 ```{r}

 ## Use aggregate to compute the observed mean math score per school
 observed.means = aggregate(math ~ school, data = data1, mean)

 ## Compute the posterior means of the alpha parameters
 ## to use as point estimates
 alphas = samples[,1:160]
 alpha.est = colMeans(alphas)

 ```

 We can now plot the two estimates against one another.

 ```{r tidy = FALSE}
 plot(observed.means$math, alpha.est, 
     ylab = "Hierarchical estimate", 
     xlab = "Observed school mean", 
     col = rgb(0,0,0,.3), pch = 19)
 abline(0,1, lty = 2, col="red")

 ```

 Notice the moderate hierarchical shrinkage.

diff --git a/lab.Rmd b/lab.Rmd
 ---
 title: "Bayesian Statistics: A practical introduction"
 subtitle: Lab 2
 author: "Richard D. Morey, Cardiff University (<a href='mailto:[email protected]'>[email protected]</a>)"
 framework: bootstrap
 mode: selfcontained
 highlighter: prettify
 hitheme: twitter-bootstrap
 output:
  html_document:
    toc: true
    toc_float: false
 assets:
  css:
    - "http://fonts.googleapis.com/css?family=Raleway:300"
    - "http://fonts.googleapis.com/css?family=Oxygen"
 always_allow_html: true
 editor_options: 
  chunk_output_type: console
 ---


 <style>
 body{
  font-family: 'Oxygen', sans-serif;
  font-size: 16px;
  line-height: 24px;
 }

 h1,h2,h3,h4 {
  font-family: 'Raleway', sans-serif;
 }

 .container { width: 1000px; }
 h3 {
  background-color: #D4DAEC;
  text-indent: 50px; 
 }
 h4 {
  text-indent: 100px;
 }

 g-table-intro h4 {
  text-indent: 0px;
 }

 * {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
 }

 table {
  text-align: left;
  line-height: 40px;
  border-collapse: separate;
  border-spacing: 0;
  border: 2px solid #ed1c40;
  width: 500px;
  margin: 50px auto;
  border-radius: .25rem;
 }

 thead tr:first-child {
  background: #ed1c40;
  color: #fff;
  border: none;
 }

 th:first-child,
 td:first-child {
  padding: 0 15px 0 20px;
 }

 thead tr:last-child th {
  border-bottom: 3px solid #ddd;
 }

 tbody tr:hover {
  background-color: rgba(237, 28, 64, .1);
  cursor: default;
 }

 tbody tr:last-child td {
  border: none;
 }

 tbody td {
  border-bottom: 1px solid #ddd;
 }

 td:last-child {
  text-align: left;
  padding-right: 10px;
 }

 </style>


 Each example contains R code for you to run by cutting and pasting the code from the document into R. The output should look similar to the output in the document below.

 Before you start, you should change your R working directory to whatever directory you have stored this lab file in. This can be done through the `File...` menu in Windows, or in all platforms using the `setwd()` function. 

 Hierarchical modeling
 -----------------

 ### Setup
 See Jackman (2009) offers the following Example 7.6 (page 323).

 > ANOVA. The 1982 High School and Beyond Survey is a nationally representative
 sample of US public and Catholic schools, covering 7185 students in 160 schools. The
 chief outcome of interest is a standardized measure of math ability, with a mean of
 12.75 and IQR of [7.28, 18.32]. These data figure prominently in the standard reference
 on hierarchical linear models (Raudenbush and Bryk 2002), and so are well suited for
 our purposes here.
 > We fit a one-way ANOVA model (Equation 7.1 in Example 7.1), recognizing that
 students are grouped by school. We momentarily ignore school type (public vs Catholic)
 and other school-level characteristics...

 The code to run the analysis is in `oneWay.R`. Remember to change R's working directory, and to load the `nimble` library. See the BUGS code on `oneWay2.bug`, and make sure you understand the model.

 ### Data preparation

 First we prepare packages and read in the data.

 ```{r}
 library(multilevel)
 library(nimble)

 # Read in the data using the code in the dataPrep.R
 source('dataPrep.R')
 ```

 We can now examine the structure of the data. Each row is a student.

 ```{r echo=TRUE}
 DT::datatable(data1, options = list(pageLength = 5))
 ```

 We also have access to school level covariates in the data frame `data2`.

 Math scores go from 0 to 25, as we can see by looking at a summary of the scores in the `math` column.

 ```{r}

 ## Summarise the column of interest
 summary(data1$math)

 ```


 Now we can look at the counts of the students and schools in the data set.

 ```{r}

 ## How many students in each school are there?
 table(data1$school)


 ## How many schools?
 length(unique(data1$school))

 ```

 To see if it will be important to include school as a covariate, we compute the intra-class correlation to see if `school` adds a substantial amount of variance. In this case, the ICC is an estimate of
 \[
 \frac{\sigma^2_{\mbox{school}}}{\sigma^2_{\mbox{school}} + \sigma^2_\epsilon}
 \]
 The ICC measures the relative contribution of the random schools and random error to the variance in the data.


 ```{r}
 ## First, perform a classical one-way ANOVA
 classical <- aov(math ~ as.factor(school),
                 data = data1)
 summary(classical)

 ## intra-class correlation
 multilevel::ICC1(classical)

 ```

 It seems that the intraclass correlation coefficient is moderately high. We should include school in our model.



 ### Part 1: Fitting the hierarchical model

 We will fit the hierarchical model built in `oneWay2.bug`, listed below:

 ```
 ```{r echo=FALSE, results='asis'}
 cat(paste(readLines('oneway2.bug'), collapse = "\n"))
 ```
 ```

 What are the variables?

 Variable name | Type       | Length | Description
 --------------|------------|--------|--------------
 `math`        | data       |  Number of students      | the math score of the `i`th student
 `Nstudents`       | data  |  1 | Total number of students
 `alpha`       | parameter  |  Number of schools | the effect of being in `p`th school 
 `j`           | index      |  Number of students  | an index telling us which school the `i`th student attends
 `tau.eps`     | parameter  |  1 | Precision (inverse of variance) of math scores within schools
 `J`           | data       | 1  | Total number of schools
 `mu0`         | parameter  | 1  | Grand mean math performance of across schools
 `tau.alpha`      | parameter | 1 | Precision of mean math scores across schools
 `sigma.eps`    | parameter | 1 | Standard deviation of math scores within schools
 `sigma.alpha` | parameter | 1 | Standard deviation of mean math scores across schools





 ```{r results = 'hide'}
 ## Create the data to pass to Nimble
 forNimble <- list()
 forNimble$math <- data1$math
 forNimble$Nstudents = length(data1$math)
 forNimble$j <- match(data1$school,     ## indexes which school we're in
                   unique(data1$school))
 forNimble$J <- max(forNimble$j)          ## number of schools

 ## initialize the Gibbs sampler with starting values
 ## this is not necessary most of the time.
 inits <- list(mu0=mean(forNimble$math),
              alpha=tapply(forNimble$math,
                forNimble$j,mean)     ## school-specific means
              )

 nimble.model <- nimble::readBUGSmodel(
  model="oneway2.bug",
  data=forNimble,
  inits=inits)

 ## get 4k iterations
 compiled.model = compileNimble(nimble.model)

 conf <- configureMCMC(nimble.model)
 conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha"))

 # Build the MCMC
 Rmcmc <- buildMCMC(conf)
 Cmcmc <- compileNimble(Rmcmc, project = compiled.model)

 runMCMC(Cmcmc, niter = 4000) |>
  as.mcmc() -> samples 

 ```

 Let's first check one of our convergence statistics. 

 ```{r eval = FALSE}
 ## Geweke diagnostic
 gd = geweke.diag(samples)

 ## Extract the z scores from the object
 geweke.z.scores = gd$z

 ## plot the cumulative distribution of the z scores
 plot(ecdf(geweke.z.scores), xlab = "Geweke diagnostic z score")

 ## Add the theoretical line (standard normal)
 curve( pnorm(x), xlim = par()$usr[1:2], add = TRUE, lwd = 2, col="red")
 ```

 We now make a PDF file containing plots of all the chains. We make a PDF because it is easier to page through a PDF than a lot of R plots.

 ```{r eval = FALSE}
 ## see summaries for all of the output
 summary(samples)

 ## Dump a PDF with all chains
 pdf('chains.pdf', version = "1.4")
 plot(samples)
 dev.off()
 ```


 Compare the hierarchical estimates of the school means with the nonhierarchical school means (use `aggregate()` or `tapply()` to find the group means). What do you expect to see? Is this what you find?

 ```{r child='solution_R/exercise_1/compare_hier.Rmd', include = solutions, eval = solutions}
 ```


 ### Part 2: Assessing the variance in schools

 Compute a Bayesian posterior for the intraclass correlation coefficient using the values from the MCMC chain. 

 1. Use reparametrization to compute the two variances of interest.
 2. Compute a chain of ICC values using the definition of the ICC above. 
 3. Compute the posterior mean and 95% credible interval.
 4. Plot the posterior.

 Does the Bayesian ICC agree with the classical ICC?

 ```{r child='solution_R/exercise_2/bayes_ICC.Rmd', include = solutions, eval = solutions}
 ```

 ### Part 3: Adding a school-level covariate

 Choose a school-level covariate to add to the model. The school-level covariates are contained in the data frame `data2`, which has the following structure. Each row is a school.


 ```{r echo=FALSE}
 DT::datatable(data2, options = list(pageLength = 5))
 ```

 (The solution uses school size.)

 ```{r child='solution_R/exercise_3/school_size.Rmd', include = solutions, eval = solutions}
 ```


diff --git a/oneway2.bug b/oneway2.bug
 model{
 	## loop over the student-level data
 	for(i in 1:Nstudents){
 	      ## j is a variable (nested indexing)
 	      math[i] ~ dnorm(alpha[j[i]], tau.eps)
 	}
 	
 	## loop over the J schools (level 2)
 	for(p in 1:J){
 	      alpha[p] ~ dnorm(mu0, tau.alpha)
 	}
 	
 	## priors on hyperparameters
 	mu0 ~ dnorm(0, .0001)

 	## uniform priors on standard deviations
 	sigma.eps ~ dunif(0, 10)
  sigma.alpha ~ dunif(0, 10)

 	## convert the standard deviations to precisions	 
 	tau.eps <- pow(sigma.eps, -2)
 	tau.alpha <- pow(sigma.alpha, -2)

 }

diff --git a/oneway3.bug b/oneway3.bug
 model{
 	## loop over the student-level data
 	for(i in 1:Nstudents){
 	      ## j is a variable (nested indexing)
 	      math[i] ~ dnorm(alpha[j[i]], tau.eps)
 	}
 	
 	## loop over the J schools (level 2)
 	for(p in 1:J){
 	      alpha.pred[p] <- mu0 + beta.size * size[p]
 	      alpha[p] ~ dnorm(alpha.pred[p] , tau.alpha)
 	}
 	
 	## priors on hyperparameters
 	mu0 ~ dnorm(0, .0001)
 	beta.size ~ dnorm(0, .0001)

 	## uniform priors on standard deviations
 	sigma.eps ~ dunif(0, 10)
 	sigma.alpha ~ dunif(0, 10)

 	## convert the standard deviations to precisions	 
 	tau.eps <- pow(sigma.eps, -2)
 	tau.alpha <- pow(sigma.alpha, -2)

 }

diff --git a/school_size.Rmd b/school_size.Rmd

 #### Solution

 I chose to add in the size of the school as a covariate. It makes sense to "center" the school and create a "school size ratio", of which we will take the logarithm (base 2). This means the slope can be interpreted as the change in mean math score for every doubling of the 
 school size.

 This answers the question of *whether school size has an effect on math scores.*

 Our school-level covariates are in `data2`. We construct our centered school size covariate called `size`...

 ```{r}
 size = log2(data2$size) - mean(log2(data2$size))
 ```

 ...and then include it in the object we will pass to JAGS:

 ```{r}
 ## Create the data to pass to JAGS
 forNimble <- list()
 forNimble$math <- data1$math
 forNimble$Nstudents = length(data1$math)
 forNimble$j <- match(data1$school,     ## indexes which school we're in
                   unique(data1$school))
 forNimble$J <- max(forNimble$j)          ## number of schools
 forNimble$size <- size
 ```

 The BUGS file is shown below.
 ```
 ```{r echo=FALSE, results = 'asis'}
 cat(paste(readLines("../../solution_bug/oneway3.bug"),collapse = "\n"))
 ```
 ```

 We now have all the information we need to fit the model. Note that we need to include the new parameter `beta.size` in the initialization object.

 ```{r}
 ## initialize the Gibbs sampler with starting values
 ## this is not necessary most of the time.
 inits <- list(mu0 = mean(forNimble$math),
              alpha = tapply(forNimble$math,
                forNimble$j,mean),     ## school-specific means
              beta.size = 0)
 ```

 We compile and sample from the model.

 ```{r eval = FALSE}
 ## compile Nimble model
 nimble.model <- nimble::readBUGSmodel(
  model="solution_bug/oneway3.bug",
  data=forNimble,
  inits=inits)
 ```

 ```{r echo=FALSE}
 ## compile Nimble model
 nimble.model <- nimble::readBUGSmodel(
  model="../../solution_bug/oneway3.bug",
  data=forNimble,
  inits=inits)
 ```


 ```{r results = 'hide'}
 ## get 4k iterations
 compiled.model = compileNimble(nimble.model)

 conf <- configureMCMC(nimble.model)
 conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha", "beta.size"))

 # Build the MCMC
 Rmcmc <- buildMCMC(conf)
 Cmcmc <- compileNimble(Rmcmc, project = compiled.model)

 runMCMC(Cmcmc, niter = 4000) |>
  as.mcmc() -> samples 

 ```


 Note that we now include `beta.size` as a parameter to monitor.

 We can now plot the posterior distribution for the `beta.size` parameter...

 ```{r}
 summary(samples[,"beta.size"])
 ```

 ...and plot the marginal posterior:

 ```{r}
 plot(samples[,"beta.size"])
 ```


 Recall that we estimated the observed school means in part 1, so we can use this to check our answer. We will plot the observed means against the school size variable, and then add a regression line based on the Bayesian estimates.

 ```{r}
 plot(size, observed.means$math, ylab = "Math score", xlab = "(Centered) log size", pch = 19, col = rgb(0,0,0,.3))

 ## Compute the Bayesian point estimates
 intercept = mean(samples[,"mu0"])
 slope = mean(samples[,"beta.size"])

 ## classical (non-hierarchical) regression line in red
 abline(lm(observed.means$math~size), col = "red", lwd=2, lty=2)

 ## Bayesian regression line in blue
 abline(intercept, slope, col = "blue", lwd=2, lty=1)

 ```

 The results agree. The difference between the two is that the hierarchical model can estimate "proper" standard errors, accounting for uncertainty at all levels (student and school) whereas the classical estimates are based on aggregation across students, and hence cannot take into account the uncertainty at that level.

 ```{r}
 ## Posterior standard deviation of the intercept
 sd(samples[,"mu0"])

 ## Posterior standard deviation of beta
 sd(samples[,"beta.size"])

 ## Classical (non-hierarchical) regression 
 summary(lm(observed.means$math~size))
 ```

 Though, to be fair, the results are nearly identical in this particular case owing to the moderately-high, and roughly equal, number of observations in each school.

	#### Solution

	We first compute the two variances we need to compute the Bayesian ICC. Remember that we have two parameters that represent standard deviations, so we can use these to compute the variances.

	```{r}
	## Bayesian ICC
	sigma2.eps = samples[,'sigma.eps']^2
	sigma2.alpha = samples[,'sigma.alpha']^2
	```

	Remember that `sigma2.eps` and `sigma2.alpha` are both vectors of values; they are MCMC chains.

	We can use the formula for the ICC to compute a new vector (MCMC chain) of values that represent posterior samples of the ICC.

	```{r}
	## Create a chain for the Bayesian estimate of the ICC
	bayes.icc = sigma2.alpha / (sigma2.alpha + sigma2.eps)

	## Convert the vector to an mcmc object for convenience
	bayes.icc = mcmc(bayes.icc)
	```

	Now that we have a vector of values, it is easy to compute posterior quantities of interest.

	```{r}
	summary(bayes.icc)
	```

	We can also plot the posterior of the Bayesian ICC and add the classical ICC for comparison.

	```{r tidy = FALSE}
	## Plot the posterior
	hist(bayes.icc, col="lightgray",
	border = "darkgray", freq = FALSE,
	xlab = "Bayesian ICC")

	## Add the classical value to the plot, in green
	abline(v = multilevel::ICC1(classical), col = "darkgreen", lwd=2)
	text(multilevel::ICC1(classical), par()$usr[3],
	"Classical ICC", adj = c(-.2,1.2), srt=90,
	col = "darkgreen", cex = .8)
	```

	Notice how easy it was to obtain a credible interval for the Bayesian ICC. For many quantities, it will be quite difficult to obtain a classical confidence interval, yet a Bayesian credible interval will be easily obtainable from the MCMC chain.

	#### Another perspective

	Another (complementary) way we can view the problem is to look at the estimates of the $\alpha$ parameters.

	Recall that `alphas` contains our chains for the alpha parameters and `alpha.est` contains the posterior means. We now compute the 95% credible intervals for each $\alpha$.

	```{r}
	## Use apply() to compute central 95% credible intervals
	CIs = apply(alphas, 2, function(v) quantile(v, p = c(.025, .975)))

	```

	We want to plot these estimates and CIs in a way that will enable us to visualize variance of the schools. A cumulative distribution plot, with error bars, is one good way of doing this.

	```{r tidy = FALSE}
	## We need to reorder the estimates and CIs together, so we define a
	## variable to keep track of the ordering
	ord = order(alpha.est)

	## Plot the school means, in order from worst to best (using ord)
	plot(alpha.est[ord], 1:length(alpha.est),
	xlim = range(CIs),
	pch = 19, ylab = "School (ordered)",
	xlab = "Hierarchical school mean")

	## Add the corresponding credible intervals
	segments(CIs[1,ord], 1:length(alpha.est),
	CIs[2,ord], 1:length(alpha.est),
	col = rgb(0,0,0,.2))

	```

	As can be seen, the variability of the schools is substantial compared to the credible intervals.
	#### Solution

	We first gather the observed means and the point estimates of the $\alpha$ parameters, the hierarchical means of the schools.

	```{r}

	## Use aggregate to compute the observed mean math score per school
	observed.means = aggregate(math ~ school, data = data1, mean)

	## Compute the posterior means of the alpha parameters
	## to use as point estimates
	alphas = samples[,1:160]
	alpha.est = colMeans(alphas)

	```

	We can now plot the two estimates against one another.

	```{r tidy = FALSE}
	plot(observed.means$math, alpha.est,
	ylab = "Hierarchical estimate",
	xlab = "Observed school mean",
	col = rgb(0,0,0,.3), pch = 19)
	abline(0,1, lty = 2, col="red")

	```

	Notice the moderate hierarchical shrinkage.
	---
	title: "Bayesian Statistics: A practical introduction"
	subtitle: Lab 2
	author: "Richard D. Morey, Cardiff University (<a href='mailto:[email protected]'>[email protected]</a>)"
	framework: bootstrap
	mode: selfcontained
	highlighter: prettify
	hitheme: twitter-bootstrap
	output:
	html_document:
	toc: true
	toc_float: false
	assets:
	css:
	- "http://fonts.googleapis.com/css?family=Raleway:300"
	- "http://fonts.googleapis.com/css?family=Oxygen"
	always_allow_html: true
	editor_options:
	chunk_output_type: console
	---


	<style>
	body{
	font-family: 'Oxygen', sans-serif;
	font-size: 16px;
	line-height: 24px;
	}

	h1,h2,h3,h4 {
	font-family: 'Raleway', sans-serif;
	}

	.container { width: 1000px; }
	h3 {
	background-color: #D4DAEC;
	text-indent: 50px;
	}
	h4 {
	text-indent: 100px;
	}

	g-table-intro h4 {
	text-indent: 0px;
	}

	* {
	margin: 0;
	padding: 0;
	box-sizing: border-box;
	}

	table {
	text-align: left;
	line-height: 40px;
	border-collapse: separate;
	border-spacing: 0;
	border: 2px solid #ed1c40;
	width: 500px;
	margin: 50px auto;
	border-radius: .25rem;
	}

	thead tr:first-child {
	background: #ed1c40;
	color: #fff;
	border: none;
	}

	th:first-child,
	td:first-child {
	padding: 0 15px 0 20px;
	}

	thead tr:last-child th {
	border-bottom: 3px solid #ddd;
	}

	tbody tr:hover {
	background-color: rgba(237, 28, 64, .1);
	cursor: default;
	}

	tbody tr:last-child td {
	border: none;
	}

	tbody td {
	border-bottom: 1px solid #ddd;
	}

	td:last-child {
	text-align: left;
	padding-right: 10px;
	}

	</style>


	Each example contains R code for you to run by cutting and pasting the code from the document into R. The output should look similar to the output in the document below.

	Before you start, you should change your R working directory to whatever directory you have stored this lab file in. This can be done through the `File...` menu in Windows, or in all platforms using the `setwd()` function.

	Hierarchical modeling
	-----------------

	### Setup
	See Jackman (2009) offers the following Example 7.6 (page 323).

	> ANOVA. The 1982 High School and Beyond Survey is a nationally representative
	sample of US public and Catholic schools, covering 7185 students in 160 schools. The
	chief outcome of interest is a standardized measure of math ability, with a mean of
	12.75 and IQR of [7.28, 18.32]. These data figure prominently in the standard reference
	on hierarchical linear models (Raudenbush and Bryk 2002), and so are well suited for
	our purposes here.
	> We fit a one-way ANOVA model (Equation 7.1 in Example 7.1), recognizing that
	students are grouped by school. We momentarily ignore school type (public vs Catholic)
	and other school-level characteristics...

	The code to run the analysis is in `oneWay.R`. Remember to change R's working directory, and to load the `nimble` library. See the BUGS code on `oneWay2.bug`, and make sure you understand the model.

	### Data preparation

	First we prepare packages and read in the data.

	```{r}
	library(multilevel)
	library(nimble)

	# Read in the data using the code in the dataPrep.R
	source('dataPrep.R')
	```

	We can now examine the structure of the data. Each row is a student.

	```{r echo=TRUE}
	DT::datatable(data1, options = list(pageLength = 5))
	```

	We also have access to school level covariates in the data frame `data2`.

	Math scores go from 0 to 25, as we can see by looking at a summary of the scores in the `math` column.

	```{r}

	## Summarise the column of interest
	summary(data1$math)

	```


	Now we can look at the counts of the students and schools in the data set.

	```{r}

	## How many students in each school are there?
	table(data1$school)


	## How many schools?
	length(unique(data1$school))

	```

	To see if it will be important to include school as a covariate, we compute the intra-class correlation to see if `school` adds a substantial amount of variance. In this case, the ICC is an estimate of
	\[
	\frac{\sigma^2_{\mbox{school}}}{\sigma^2_{\mbox{school}} + \sigma^2_\epsilon}
	\]
	The ICC measures the relative contribution of the random schools and random error to the variance in the data.


	```{r}
	## First, perform a classical one-way ANOVA
	classical <- aov(math ~ as.factor(school),
	data = data1)
	summary(classical)

	## intra-class correlation
	multilevel::ICC1(classical)

	```

	It seems that the intraclass correlation coefficient is moderately high. We should include school in our model.



	### Part 1: Fitting the hierarchical model

	We will fit the hierarchical model built in `oneWay2.bug`, listed below:

	```
	```{r echo=FALSE, results='asis'}
	cat(paste(readLines('oneway2.bug'), collapse = "\n"))
	```
	```

	What are the variables?

	Variable name \| Type \| Length \| Description
	--------------\|------------\|--------\|--------------
	`math` \| data \| Number of students \| the math score of the `i`th student
	`Nstudents` \| data \| 1 \| Total number of students
	`alpha` \| parameter \| Number of schools \| the effect of being in `p`th school
	`j` \| index \| Number of students \| an index telling us which school the `i`th student attends
	`tau.eps` \| parameter \| 1 \| Precision (inverse of variance) of math scores within schools
	`J` \| data \| 1 \| Total number of schools
	`mu0` \| parameter \| 1 \| Grand mean math performance of across schools
	`tau.alpha` \| parameter \| 1 \| Precision of mean math scores across schools
	`sigma.eps` \| parameter \| 1 \| Standard deviation of math scores within schools
	`sigma.alpha` \| parameter \| 1 \| Standard deviation of mean math scores across schools





	```{r results = 'hide'}
	## Create the data to pass to Nimble
	forNimble <- list()
	forNimble$math <- data1$math
	forNimble$Nstudents = length(data1$math)
	forNimble$j <- match(data1$school, ## indexes which school we're in
	unique(data1$school))
	forNimble$J <- max(forNimble$j) ## number of schools

	## initialize the Gibbs sampler with starting values
	## this is not necessary most of the time.
	inits <- list(mu0=mean(forNimble$math),
	alpha=tapply(forNimble$math,
	forNimble$j,mean) ## school-specific means
	)

	nimble.model <- nimble::readBUGSmodel(
	model="oneway2.bug",
	data=forNimble,
	inits=inits)

	## get 4k iterations
	compiled.model = compileNimble(nimble.model)

	conf <- configureMCMC(nimble.model)
	conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha"))

	# Build the MCMC
	Rmcmc <- buildMCMC(conf)
	Cmcmc <- compileNimble(Rmcmc, project = compiled.model)

	runMCMC(Cmcmc, niter = 4000) \|>
	as.mcmc() -> samples

	```

	Let's first check one of our convergence statistics.

	```{r eval = FALSE}
	## Geweke diagnostic
	gd = geweke.diag(samples)

	## Extract the z scores from the object
	geweke.z.scores = gd$z

	## plot the cumulative distribution of the z scores
	plot(ecdf(geweke.z.scores), xlab = "Geweke diagnostic z score")

	## Add the theoretical line (standard normal)
	curve( pnorm(x), xlim = par()$usr[1:2], add = TRUE, lwd = 2, col="red")
	```

	We now make a PDF file containing plots of all the chains. We make a PDF because it is easier to page through a PDF than a lot of R plots.

	```{r eval = FALSE}
	## see summaries for all of the output
	summary(samples)

	## Dump a PDF with all chains
	pdf('chains.pdf', version = "1.4")
	plot(samples)
	dev.off()
	```


	Compare the hierarchical estimates of the school means with the nonhierarchical school means (use `aggregate()` or `tapply()` to find the group means). What do you expect to see? Is this what you find?

	```{r child='solution_R/exercise_1/compare_hier.Rmd', include = solutions, eval = solutions}
	```


	### Part 2: Assessing the variance in schools

	Compute a Bayesian posterior for the intraclass correlation coefficient using the values from the MCMC chain.

	1. Use reparametrization to compute the two variances of interest.
	2. Compute a chain of ICC values using the definition of the ICC above.
	3. Compute the posterior mean and 95% credible interval.
	4. Plot the posterior.

	Does the Bayesian ICC agree with the classical ICC?

	```{r child='solution_R/exercise_2/bayes_ICC.Rmd', include = solutions, eval = solutions}
	```

	### Part 3: Adding a school-level covariate

	Choose a school-level covariate to add to the model. The school-level covariates are contained in the data frame `data2`, which has the following structure. Each row is a school.


	```{r echo=FALSE}
	DT::datatable(data2, options = list(pageLength = 5))
	```

	(The solution uses school size.)

	```{r child='solution_R/exercise_3/school_size.Rmd', include = solutions, eval = solutions}
	```
	model{
	## loop over the student-level data
	for(i in 1:Nstudents){
	## j is a variable (nested indexing)
	math[i] ~ dnorm(alpha[j[i]], tau.eps)
	}

	## loop over the J schools (level 2)
	for(p in 1:J){
	alpha[p] ~ dnorm(mu0, tau.alpha)
	}

	## priors on hyperparameters
	mu0 ~ dnorm(0, .0001)

	## uniform priors on standard deviations
	sigma.eps ~ dunif(0, 10)
	sigma.alpha ~ dunif(0, 10)

	## convert the standard deviations to precisions
	tau.eps <- pow(sigma.eps, -2)
	tau.alpha <- pow(sigma.alpha, -2)

	}

	#### Solution

	I chose to add in the size of the school as a covariate. It makes sense to "center" the school and create a "school size ratio", of which we will take the logarithm (base 2). This means the slope can be interpreted as the change in mean math score for every doubling of the
	school size.

	This answers the question of whether school size has an effect on math scores.

	Our school-level covariates are in `data2`. We construct our centered school size covariate called `size`...

	```{r}
	size = log2(data2$size) - mean(log2(data2$size))
	```

	...and then include it in the object we will pass to JAGS:

	```{r}
	## Create the data to pass to JAGS
	forNimble <- list()
	forNimble$math <- data1$math
	forNimble$Nstudents = length(data1$math)
	forNimble$j <- match(data1$school, ## indexes which school we're in
	unique(data1$school))
	forNimble$J <- max(forNimble$j) ## number of schools
	forNimble$size <- size
	```

	The BUGS file is shown below.
	```
	```{r echo=FALSE, results = 'asis'}
	cat(paste(readLines("../../solution_bug/oneway3.bug"),collapse = "\n"))
	```
	```

	We now have all the information we need to fit the model. Note that we need to include the new parameter `beta.size` in the initialization object.

	```{r}
	## initialize the Gibbs sampler with starting values
	## this is not necessary most of the time.
	inits <- list(mu0 = mean(forNimble$math),
	alpha = tapply(forNimble$math,
	forNimble$j,mean), ## school-specific means
	beta.size = 0)
	```

	We compile and sample from the model.

	```{r eval = FALSE}
	## compile Nimble model
	nimble.model <- nimble::readBUGSmodel(
	model="solution_bug/oneway3.bug",
	data=forNimble,
	inits=inits)
	```

	```{r echo=FALSE}
	## compile Nimble model
	nimble.model <- nimble::readBUGSmodel(
	model="../../solution_bug/oneway3.bug",
	data=forNimble,
	inits=inits)
	```


	```{r results = 'hide'}
	## get 4k iterations
	compiled.model = compileNimble(nimble.model)

	conf <- configureMCMC(nimble.model)
	conf$addMonitors(c("mu0", "sigma.alpha", "sigma.eps", "alpha", "beta.size"))

	# Build the MCMC
	Rmcmc <- buildMCMC(conf)
	Cmcmc <- compileNimble(Rmcmc, project = compiled.model)

	runMCMC(Cmcmc, niter = 4000) \|>
	as.mcmc() -> samples

	```


	Note that we now include `beta.size` as a parameter to monitor.

	We can now plot the posterior distribution for the `beta.size` parameter...

	```{r}
	summary(samples[,"beta.size"])
	```

	...and plot the marginal posterior:

	```{r}
	plot(samples[,"beta.size"])
	```


	Recall that we estimated the observed school means in part 1, so we can use this to check our answer. We will plot the observed means against the school size variable, and then add a regression line based on the Bayesian estimates.

	```{r}
	plot(size, observed.means$math, ylab = "Math score", xlab = "(Centered) log size", pch = 19, col = rgb(0,0,0,.3))

	## Compute the Bayesian point estimates
	intercept = mean(samples[,"mu0"])
	slope = mean(samples[,"beta.size"])

	## classical (non-hierarchical) regression line in red
	abline(lm(observed.means$math~size), col = "red", lwd=2, lty=2)

	## Bayesian regression line in blue
	abline(intercept, slope, col = "blue", lwd=2, lty=1)

	```

	The results agree. The difference between the two is that the hierarchical model can estimate "proper" standard errors, accounting for uncertainty at all levels (student and school) whereas the classical estimates are based on aggregation across students, and hence cannot take into account the uncertainty at that level.

	```{r}
	## Posterior standard deviation of the intercept
	sd(samples[,"mu0"])

	## Posterior standard deviation of beta
	sd(samples[,"beta.size"])

	## Classical (non-hierarchical) regression
	summary(lm(observed.means$math~size))
	```

	Though, to be fair, the results are nearly identical in this particular case owing to the moderately-high, and roughly equal, number of observations in each school.