Skip to content

Instantly share code, notes, and snippets.

@lluang
Last active May 24, 2020 00:52
Show Gist options
  • Save lluang/0c1b2e093f3f5fb972e02b3077f5ac4c to your computer and use it in GitHub Desktop.
Save lluang/0c1b2e093f3f5fb972e02b3077f5ac4c to your computer and use it in GitHub Desktop.
S-I-R Simulation model using R
---
title: 'S-I-R Simulation Models'
author: "IE 0015 Information Systems"
date: "Spring, 2020"
output:
pdf_document: default
html_document:
df_print: paged
---
This was assigned as a homework assignment in my "Information Systems Engineering" course for undergraduate engineering in the Department of Industrial Engineering. The course is taught as a data analysis course, and this was a large assignment that went with a module on simulation models. The students had already been working with COVID-19 data for data analysis examples and labs throughout the semester (we started when COVID-19 was something you heard about in the news about China), so they had many such assignments.
```{r loadlibraries, echo=FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(tidyr)
library(magrittr)
library(ggplot2)
```
The standard model of spread of infectious disease spread is the S-I-R compartmental model: susceptible, infected, removed. The population is divided into three compartments, and the model tracks the change in the size of each compartment over time. The standard depiction of the model is as a system of differential equations.
$$dS = -\beta *S*I$$
$$dI = \beta*S*I - \nu*I$$
$$dR = \nu*I $$
Where:
- $\beta$ is the product of the contact per unit time and the probability of transmission. It is the product of the probability of contact of an infected individual and a susceptible individual and the probability that a contact results in the susceptible person being infected.
- $\nu$ is the recovery rate
- S is the number of susceptible individuals. At the beginning this is the entire population. dS is the change in number of suspectible individuals in a given unit time
- I is the number of infected individuals. Typically, in a model the initial value is 1. dI is the change in number of infected individuals in a given unit time
- dR is the change in number of removed individuals in a given unit time. This can either be due to recovery with future immunity, or through death of the individual.
This homework assignment will develop this model as a simulation. We will look at an artificial population of 1000 people, with one being infected.
```{r setup}
nperiods = 365
population = 1000
averagecontacts = 15
pinfect = 0.02
recoveryrate = 1/14 # 1 / average recovery time
```
1. [5] Calculate the probability that two individuals meet. Save it to a variable named `pcontact`.
```{r}
pcontact <- 0 # calculate pcontact here
```
```{r}
pcontact = averagecontacts/population
```
2. [10] This assignment will model the spread of the disease as a mixing model, meaning that every individual has an equal probability of being in contact with any other individual. For every time period (a day) write a function named `checkinfection(nS, nI)` that determines given the the starting number of susceptible and infected:
(1) how many instances an infected individual is in contact with a susceptible individual
(2) How many of those contacts result in infection.
(3) Run this code for the first day (I0 = 1, S0 = 999, R0= 0)
```{r}
I0 = 1
S0 = population - I0
R0 = 0
checkinfection <- function(nS, nI){
newlyinfected <- c()
if( (nS > 0) & (nI > 0)){
checkcontacted <- matrix(runif(nS*nI), ncol=nI)
contacts <- ifelse(checkcontacted < pcontact, 1,0)
checkinfected <- matrix(runif(nS*nI), ncol=nI)
infectedifcontacted <- ifelse(checkinfected < pinfect, 1, 0)
infected <- contacts * infectedifcontacted
infectednew <- ifelse(rowSums(infected)>=1,1,0) #you can only get infected once
newlyinfected <- c(newlyinfected, sum(infectednew))
dI <- min(sum(newlyinfected), nS)
} else {
dI <- 0
}
dI
}
dI = checkinfection(S0, I0)
I1 = I0 + dI
s1 = S0 - dI
print(c(I1, s1))
```
3. [15] For those who are infected, assume that in each time period, the probability of recovery is 1 / (average recovery time). Write a function named `checkrecover(nI)` that calculates the number of recovered given the number of infected. Run the function with 100 infected.
1. Similar to the infected function. Check each infected person individually in each time period using either vectorized functions or a for loop. (Do not use number recovered = number infected * recoveryrate)
```{r}
Istarting = 100
checkrecover <- function(nI){
checkrecovered <- runif(nI)
#print(checkrecovered)
recovered = ifelse(checkrecovered < recoveryrate, 1,0)
#print(recovered)
dr <- sum(recovered)
}
dr <- checkrecover(Istarting)
print(dr)
```
4. [10] Simulate one year.
a. Use the functions you just created to run a simulation of one year with a starting population of 1000 (S0=999, I0 = 1, R0 = 0). You will need to run 365 days of simulation and save each day's S, I and R (hint: create a vector for each). The output of the simulation will be the maximum number infected and the number of days that the number of infected is greater than **20**.
b. Plot Susceptible, Infected, and Recovered vs time (days).
```{r}
# One day
onedayepidemic <- function(nS, nI, nR){
dI <- checkinfection(nS, nI)
dR <- checkrecover(nI)
S1 = nS - dI
I1 = nI + dI - dR
R1 = nR + dR
return = c(S1, I1, R1)
}
oneyearepidemic <- function(S0 = 999, I0 = 1, R0 = 0){
oneyear <- 365
nS = c(S0)
nI = c(I0)
nR = c(R0)
for (day in 1:(oneyear-1)){
nextday <- onedayepidemic(nS[day], nI[day], nR[day])
nS <- c(nS, nextday[1])
nI <- c(nI, nextday[2])
nR <- c(nR, nextday[3])
}
simulationresults <- data.frame(day = 1:oneyear,
S = nS,
I = nI,
R = nR
)
simulationresults
}
```
```{r}
oneyearresults <- oneyearepidemic(S0, I0, R0)
ggplot(oneyearresults) +
geom_point(aes(day, S), color = "green") +
geom_point(aes(day, I), color = "red") +
geom_point(aes(day, R), color = "blue")
```
```{r}
infectedstatistic <- oneyearresults %>%
mutate(isinfected = (I>=20)) %>%
filter(isinfected == TRUE) %>%
group_by(isinfected) %>%
summarize(peak=max(I),
startday = min(day),
endday = max(day)) %>%
mutate(duration = (endday - startday))
infectedstatistic
```
5. [10] Now run 100 replications (each replication is 365 days).
a. Represent all 100 replications on a single graph (note: there are many possible answers). Keep in mind that from the representation of any single replication, it should be possible to determine the maximimum infected and the number of days the number of infected is greater than 20, but your final graph you are more interested in the distributions of the outputs, not the individual values.
b. Repeat 100 replications, but change the average number of contacts per person to 10 from 15. Discuss the difference in results.
```{r}
epidemicsimulation <- function(S0 = 999, I0 = 1, R0 = 0){
oneyearresults <- oneyearepidemic(S0, I0, R0)
infectedstatistic <- oneyearresults %>%
mutate(isinfected = (I>=20)) %>%
filter(isinfected == TRUE) %>%
group_by(isinfected) %>%
summarize(peak=max(I),
startday = min(day),
endday = max(day)) %>%
mutate(duration = (endday - startday))
data.frame(infectedstatistic$peak, infectedstatistic$duration)
}
infectedresults <- lapply(rep(999, 100), epidemicsimulation)
epidemicresults <- as.data.frame(do.call(rbind, infectedresults))
names(epidemicresults) <- c("peak", "duration")
```
```{r}
ggplot(epidemicresults) +
geom_point(aes(peak, duration)) +
ggtitle("Epidemic peak and duration for 100 simulations") +
xlim(1,750) + ylim(1,150)
```
```{r}
averagecontacts = 10
pcontact = averagecontacts/population
infectedresults10 <- lapply(rep(999, 100), epidemicsimulation)
epidemicresults10 <- as.data.frame(do.call(rbind, infectedresults))
names(epidemicresults10) <- c("peak", "duration")
```
```{r}
ggplot(epidemicresults10) +
geom_point(aes(peak, duration)) +
ggtitle("Epidemic peak and duration for 100 simulations where average contacts = 10") +
xlim(1,750) + ylim(1,150)
```
# Postscript
In real life, after a simulation like this was used, you would change parameters to reflect potential policy changes. For example, social distancing changes the variable `averagecontacts`. Protective equipment such as non-medical masks changes the variable `pinfect`. Treatment methods changes `recoveryrate`. And you would see how the results shift as you change any of them. (e.g. reducing `averagecontacts` reduces the peak and increases the duration).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment