From 55d4ba2e157442531bd2fd009ffd218b4346f85f Mon Sep 17 00:00:00 2001 From: Julie Gullstrand Date: Mon, 27 Apr 2020 15:01:10 +0200 Subject: [PATCH] exo 2 finit --- module3/exo1/influenza-like-illness-analysis.Rmd | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/module3/exo1/influenza-like-illness-analysis.Rmd b/module3/exo1/influenza-like-illness-analysis.Rmd index 8047fa0..628927c 100644 --- a/module3/exo1/influenza-like-illness-analysis.Rmd +++ b/module3/exo1/influenza-like-illness-analysis.Rmd @@ -23,7 +23,7 @@ knitr::opts_chunk$set(echo = TRUE) The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download. The URL is: ```{r} -data_url = "http://www.sentiweb.fr/datasets/incidence-PAY-3.csv" +data_url = "http://www.sentiweb.fr/datasets/incidence-PAY-7.csv" ``` This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json): @@ -121,21 +121,21 @@ with(tail(data, 200), plot(date, inc, type="l", xlab="Date", ylab="Weekly incide ### Computation -Since the peaks of the epidemic happen in winter, near the transition between calendar years, we define the reference period for the annual incidence from August 1st of year $N$ to August 1st of year $N+1$. We label this period as year $N+1$ because the peak is always located in year $N+1$. The very low incidence in summer ensures that the arbitrariness of the choice of reference period has no impact on our conclusions. +Since the peaks of the epidemic happen in winter, near the transition between calendar years, we define the reference period for the annual incidence from september 1st of year $N$ to September 1st of year $N+1$. We label this period as year $N+1$ because the peak is always located in year $N+1$. The very low incidence in summer ensures that the arbitrariness of the choice of reference period has no impact on our conclusions. The argument `na.rm=True` in the sum indicates that missing data points are removed. This is a reasonable choice since there is only one missing point, whose impact cannot be very strong. ```{r} yearly_peak = function(year) { - debut = paste0(year-1,"-08-01") - fin = paste0(year,"-08-01") + debut = paste0(year-1,"-09-01") + fin = paste0(year,"-09-01") semaines = data$date > debut & data$date <= fin sum(data$inc[semaines], na.rm=TRUE) } ``` -We must also be careful with the first and last years of the dataset. The data start in October 1984, meaning that we don't have all the data for the peak attributed to the year 1985. We therefore exclude it from the analysis. For the same reason, we define 2018 as the final year. We can increase this value to 2019 only when all data up to July 2019 is available. +We must also be careful with the first and last years of the dataset. The data start in October 1984, meaning that we don't have all the data for the peak attributed to the year 1985. We therefore exclude it from the analysis. For the same reason, we define 2019 as the final year. We can increase this value to 2019 only when all data up to July 2019 is available. ```{r} -years = 1986:2018 +years = 1991:2019 ``` We make a new data frame for the annual incidence, applying the function `yearly_peak` to each year: -- 2.18.1