diff --git a/module3/exo2/exercice_en.Rmd b/module3/exo2/exercice_en.Rmd index 241cd1cafd11b53945d135ee2c195c740619bd5a..23c39a85984a642eadda0304858a1ebf545c1377 100644 --- a/module3/exo2/exercice_en.Rmd +++ b/module3/exo2/exercice_en.Rmd @@ -1,13 +1,13 @@ --- -title: "Incidence of influenza-like illness in France" +title: "Incidence of Chickenpox in France" author: "Jhouben Cuesta Ramirez" date: "07/06/2021" output: - pdf_document: - toc: true html_document: toc: true theme: journal + pdf_document: + toc: true documentclass: article classoption: a4paper header-includes: @@ -22,15 +22,16 @@ knitr::opts_chunk$set(echo = TRUE) ## Data preprocessing -The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download. The URL is: +The data on the incidence incidence of chickenpox illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1991 and ending with a recent week, is available for download. The URL is: ```{r} -data_url = "http://www.sentiweb.fr/datasets/incidence-PAY-3.csv" +data_url = "https://www.sentiweb.fr/datasets/incidence-PAY-7.csv" ``` +In order to preserve the re-producibility of this report, we made ourselves a local copy of the original data without adding or deleting any information at the date of : 07/06/2021. ```{r} #The idea if to have a local backup of the file, in the case of the website being down or cease to exist. -data_csv = "grippal.csv" +data_csv = "chickenpox.csv" if (!file.exists(data_csv)) { download.file(data_url, data_csv, method="auto") } @@ -131,15 +132,15 @@ with(tail(data, 200), plot(date, inc, type="l", xlab="Date", ylab="Weekly incide ### Computation -Since the peaks of the epidemic happen in winter, near the transition between calendar years, we define the reference period for the annual incidence from August 1st of year $N$ to August 1st of year $N+1$. We label this period as year $N+1$ because the peak is always located in year $N+1$. The very low incidence in summer ensures that the arbitrariness of the choice of reference period has no impact on our conclusions. +According to the requested in the exercise, we define the reference period for the annual incidence from September 1st of year $N$ to September 1st of year $N+1$. -The argument `na.rm=True` in the sum indicates that missing data points are removed. This is a reasonable choice since there is only one missing point, whose impact cannot be very strong. +Given that we have no missing data points, the previous argument `na.rm=True` in the sum was removed. ```{r} yearly_peak = function(year) { - debut = paste0(year-1,"-08-01") - fin = paste0(year,"-08-01") + debut = paste0(year-1,"-09-01") + fin = paste0(year,"-09-01") semaines = data$date > debut & data$date <= fin - sum(data$inc[semaines], na.rm=TRUE) + sum(data$inc[semaines]) } ``` @@ -169,6 +170,12 @@ A list sorted by decreasing annual incidence makes it easy to find the most impo head(annnual_inc[order(-annnual_inc$incidence),]) ``` +### Identification of the weakest epidemics + +A list sorted by increasing annual incidence makes it easy to find the least important ones: +```{r} +head(annnual_inc[order(annnual_inc$incidence),]) +``` Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population, but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people. ```{r} hist(annnual_inc$incidence, breaks=10, xlab="Annual incidence", ylab="Number of observations", main="")