Last active
May 24, 2020 00:52
-
-
Save lluang/0c1b2e093f3f5fb972e02b3077f5ac4c to your computer and use it in GitHub Desktop.
S-I-R Simulation model using R
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: 'S-I-R Simulation Models' | |
author: "IE 0015 Information Systems" | |
date: "Spring, 2020" | |
output: | |
pdf_document: default | |
html_document: | |
df_print: paged | |
--- | |
This was assigned as a homework assignment in my "Information Systems Engineering" course for undergraduate engineering in the Department of Industrial Engineering. The course is taught as a data analysis course, and this was a large assignment that went with a module on simulation models. The students had already been working with COVID-19 data for data analysis examples and labs throughout the semester (we started when COVID-19 was something you heard about in the news about China), so they had many such assignments. | |
```{r loadlibraries, echo=FALSE, message=FALSE, warning=FALSE} | |
library(dplyr) | |
library(tidyr) | |
library(magrittr) | |
library(ggplot2) | |
``` | |
The standard model of spread of infectious disease spread is the S-I-R compartmental model: susceptible, infected, removed. The population is divided into three compartments, and the model tracks the change in the size of each compartment over time. The standard depiction of the model is as a system of differential equations. | |
$$dS = -\beta *S*I$$ | |
$$dI = \beta*S*I - \nu*I$$ | |
$$dR = \nu*I $$ | |
Where: | |
- $\beta$ is the product of the contact per unit time and the probability of transmission. It is the product of the probability of contact of an infected individual and a susceptible individual and the probability that a contact results in the susceptible person being infected. | |
- $\nu$ is the recovery rate | |
- S is the number of susceptible individuals. At the beginning this is the entire population. dS is the change in number of suspectible individuals in a given unit time | |
- I is the number of infected individuals. Typically, in a model the initial value is 1. dI is the change in number of infected individuals in a given unit time | |
- dR is the change in number of removed individuals in a given unit time. This can either be due to recovery with future immunity, or through death of the individual. | |
This homework assignment will develop this model as a simulation. We will look at an artificial population of 1000 people, with one being infected. | |
```{r setup} | |
nperiods = 365 | |
population = 1000 | |
averagecontacts = 15 | |
pinfect = 0.02 | |
recoveryrate = 1/14 # 1 / average recovery time | |
``` | |
1. [5] Calculate the probability that two individuals meet. Save it to a variable named `pcontact`. | |
```{r} | |
pcontact <- 0 # calculate pcontact here | |
``` | |
```{r} | |
pcontact = averagecontacts/population | |
``` | |
2. [10] This assignment will model the spread of the disease as a mixing model, meaning that every individual has an equal probability of being in contact with any other individual. For every time period (a day) write a function named `checkinfection(nS, nI)` that determines given the the starting number of susceptible and infected: | |
(1) how many instances an infected individual is in contact with a susceptible individual | |
(2) How many of those contacts result in infection. | |
(3) Run this code for the first day (I0 = 1, S0 = 999, R0= 0) | |
```{r} | |
I0 = 1 | |
S0 = population - I0 | |
R0 = 0 | |
checkinfection <- function(nS, nI){ | |
newlyinfected <- c() | |
if( (nS > 0) & (nI > 0)){ | |
checkcontacted <- matrix(runif(nS*nI), ncol=nI) | |
contacts <- ifelse(checkcontacted < pcontact, 1,0) | |
checkinfected <- matrix(runif(nS*nI), ncol=nI) | |
infectedifcontacted <- ifelse(checkinfected < pinfect, 1, 0) | |
infected <- contacts * infectedifcontacted | |
infectednew <- ifelse(rowSums(infected)>=1,1,0) #you can only get infected once | |
newlyinfected <- c(newlyinfected, sum(infectednew)) | |
dI <- min(sum(newlyinfected), nS) | |
} else { | |
dI <- 0 | |
} | |
dI | |
} | |
dI = checkinfection(S0, I0) | |
I1 = I0 + dI | |
s1 = S0 - dI | |
print(c(I1, s1)) | |
``` | |
3. [15] For those who are infected, assume that in each time period, the probability of recovery is 1 / (average recovery time). Write a function named `checkrecover(nI)` that calculates the number of recovered given the number of infected. Run the function with 100 infected. | |
1. Similar to the infected function. Check each infected person individually in each time period using either vectorized functions or a for loop. (Do not use number recovered = number infected * recoveryrate) | |
```{r} | |
Istarting = 100 | |
checkrecover <- function(nI){ | |
checkrecovered <- runif(nI) | |
#print(checkrecovered) | |
recovered = ifelse(checkrecovered < recoveryrate, 1,0) | |
#print(recovered) | |
dr <- sum(recovered) | |
} | |
dr <- checkrecover(Istarting) | |
print(dr) | |
``` | |
4. [10] Simulate one year. | |
a. Use the functions you just created to run a simulation of one year with a starting population of 1000 (S0=999, I0 = 1, R0 = 0). You will need to run 365 days of simulation and save each day's S, I and R (hint: create a vector for each). The output of the simulation will be the maximum number infected and the number of days that the number of infected is greater than **20**. | |
b. Plot Susceptible, Infected, and Recovered vs time (days). | |
```{r} | |
# One day | |
onedayepidemic <- function(nS, nI, nR){ | |
dI <- checkinfection(nS, nI) | |
dR <- checkrecover(nI) | |
S1 = nS - dI | |
I1 = nI + dI - dR | |
R1 = nR + dR | |
return = c(S1, I1, R1) | |
} | |
oneyearepidemic <- function(S0 = 999, I0 = 1, R0 = 0){ | |
oneyear <- 365 | |
nS = c(S0) | |
nI = c(I0) | |
nR = c(R0) | |
for (day in 1:(oneyear-1)){ | |
nextday <- onedayepidemic(nS[day], nI[day], nR[day]) | |
nS <- c(nS, nextday[1]) | |
nI <- c(nI, nextday[2]) | |
nR <- c(nR, nextday[3]) | |
} | |
simulationresults <- data.frame(day = 1:oneyear, | |
S = nS, | |
I = nI, | |
R = nR | |
) | |
simulationresults | |
} | |
``` | |
```{r} | |
oneyearresults <- oneyearepidemic(S0, I0, R0) | |
ggplot(oneyearresults) + | |
geom_point(aes(day, S), color = "green") + | |
geom_point(aes(day, I), color = "red") + | |
geom_point(aes(day, R), color = "blue") | |
``` | |
```{r} | |
infectedstatistic <- oneyearresults %>% | |
mutate(isinfected = (I>=20)) %>% | |
filter(isinfected == TRUE) %>% | |
group_by(isinfected) %>% | |
summarize(peak=max(I), | |
startday = min(day), | |
endday = max(day)) %>% | |
mutate(duration = (endday - startday)) | |
infectedstatistic | |
``` | |
5. [10] Now run 100 replications (each replication is 365 days). | |
a. Represent all 100 replications on a single graph (note: there are many possible answers). Keep in mind that from the representation of any single replication, it should be possible to determine the maximimum infected and the number of days the number of infected is greater than 20, but your final graph you are more interested in the distributions of the outputs, not the individual values. | |
b. Repeat 100 replications, but change the average number of contacts per person to 10 from 15. Discuss the difference in results. | |
```{r} | |
epidemicsimulation <- function(S0 = 999, I0 = 1, R0 = 0){ | |
oneyearresults <- oneyearepidemic(S0, I0, R0) | |
infectedstatistic <- oneyearresults %>% | |
mutate(isinfected = (I>=20)) %>% | |
filter(isinfected == TRUE) %>% | |
group_by(isinfected) %>% | |
summarize(peak=max(I), | |
startday = min(day), | |
endday = max(day)) %>% | |
mutate(duration = (endday - startday)) | |
data.frame(infectedstatistic$peak, infectedstatistic$duration) | |
} | |
infectedresults <- lapply(rep(999, 100), epidemicsimulation) | |
epidemicresults <- as.data.frame(do.call(rbind, infectedresults)) | |
names(epidemicresults) <- c("peak", "duration") | |
``` | |
```{r} | |
ggplot(epidemicresults) + | |
geom_point(aes(peak, duration)) + | |
ggtitle("Epidemic peak and duration for 100 simulations") + | |
xlim(1,750) + ylim(1,150) | |
``` | |
```{r} | |
averagecontacts = 10 | |
pcontact = averagecontacts/population | |
infectedresults10 <- lapply(rep(999, 100), epidemicsimulation) | |
epidemicresults10 <- as.data.frame(do.call(rbind, infectedresults)) | |
names(epidemicresults10) <- c("peak", "duration") | |
``` | |
```{r} | |
ggplot(epidemicresults10) + | |
geom_point(aes(peak, duration)) + | |
ggtitle("Epidemic peak and duration for 100 simulations where average contacts = 10") + | |
xlim(1,750) + ylim(1,150) | |
``` | |
# Postscript | |
In real life, after a simulation like this was used, you would change parameters to reflect potential policy changes. For example, social distancing changes the variable `averagecontacts`. Protective equipment such as non-medical masks changes the variable `pinfect`. Treatment methods changes `recoveryrate`. And you would see how the results shift as you change any of them. (e.g. reducing `averagecontacts` reduces the peak and increases the duration). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment