{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Incidence of influenza-like illness in France"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import isoweek"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n",
"\n",
"| Column name | Description |\n",
"|--------------|---------------------------------------------------------------------------------------------------------------------------|\n",
"| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n",
"| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n",
"| `inc` | Estimated incidence value for the time step, in the geographic level |\n",
"| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n",
"| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n",
"| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n",
"| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n",
"| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n",
"| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n",
"| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n",
"\n",
"The first line of the CSV file is a comment, which we ignore with `skip=1`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" week | \n",
" indicator | \n",
" inc | \n",
" inc_low | \n",
" inc_up | \n",
" inc100 | \n",
" inc100_low | \n",
" inc100_up | \n",
" geo_insee | \n",
" geo_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 202407 | \n",
" 3 | \n",
" 163687 | \n",
" 148996.0 | \n",
" 178378.0 | \n",
" 245 | \n",
" 223.0 | \n",
" 267.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 1 | \n",
" 202406 | \n",
" 3 | \n",
" 191550 | \n",
" 179309.0 | \n",
" 203791.0 | \n",
" 287 | \n",
" 269.0 | \n",
" 305.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2 | \n",
" 202405 | \n",
" 3 | \n",
" 216237 | \n",
" 203595.0 | \n",
" 228879.0 | \n",
" 324 | \n",
" 305.0 | \n",
" 343.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 3 | \n",
" 202404 | \n",
" 3 | \n",
" 213196 | \n",
" 200547.0 | \n",
" 225845.0 | \n",
" 320 | \n",
" 301.0 | \n",
" 339.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 4 | \n",
" 202403 | \n",
" 3 | \n",
" 163457 | \n",
" 152276.0 | \n",
" 174638.0 | \n",
" 245 | \n",
" 228.0 | \n",
" 262.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" week indicator inc inc_low inc_up inc100 inc100_low \\\n",
"0 202407 3 163687 148996.0 178378.0 245 223.0 \n",
"1 202406 3 191550 179309.0 203791.0 287 269.0 \n",
"2 202405 3 216237 203595.0 228879.0 324 305.0 \n",
"3 202404 3 213196 200547.0 225845.0 320 301.0 \n",
"4 202403 3 163457 152276.0 174638.0 245 228.0 \n",
"\n",
" inc100_up geo_insee geo_name \n",
"0 267.0 FR France \n",
"1 305.0 FR France \n",
"2 343.0 FR France \n",
"3 339.0 FR France \n",
"4 262.0 FR France "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_data = pd.read_csv(data_file, skiprows=1)\n",
"raw_data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Are there missing data points? Yes, week 19 of year 1989 does not have any observed values."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" week | \n",
" indicator | \n",
" inc | \n",
" inc_low | \n",
" inc_up | \n",
" inc100 | \n",
" inc100_low | \n",
" inc100_up | \n",
" geo_insee | \n",
" geo_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 1814 | \n",
" 198919 | \n",
" 3 | \n",
" - | \n",
" NaN | \n",
" NaN | \n",
" - | \n",
" NaN | \n",
" NaN | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" week indicator inc inc_low inc_up inc100 inc100_low inc100_up \\\n",
"1814 198919 3 - NaN NaN - NaN NaN \n",
"\n",
" geo_insee geo_name \n",
"1814 FR France "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_data[raw_data.isnull().any(axis=1)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We delete this point, which does not have big consequence for our rather simple analysis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to protect us in case the Réseau Sentinelles Web server disappears or is modified, we make a local copy of this dataset that we store together with our analysis. It is unnecessary and even risky to download the data at each execution, because in case of a malfunction we might be replacing our file by a corrupted version. Therefore we download the data only if no local copy exists."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"data_file = \"syndrome-grippal.csv\"\n",
"\n",
"import os\n",
"import urllib.request\n",
"if not os.path.exists(data_file):\n",
" urllib.request.urlretrieve(data_url, data_file)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" week | \n",
" indicator | \n",
" inc | \n",
" inc_low | \n",
" inc_up | \n",
" inc100 | \n",
" inc100_low | \n",
" inc100_up | \n",
" geo_insee | \n",
" geo_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 202407 | \n",
" 3 | \n",
" 163687 | \n",
" 148996.0 | \n",
" 178378.0 | \n",
" 245 | \n",
" 223.0 | \n",
" 267.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 1 | \n",
" 202406 | \n",
" 3 | \n",
" 191550 | \n",
" 179309.0 | \n",
" 203791.0 | \n",
" 287 | \n",
" 269.0 | \n",
" 305.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2 | \n",
" 202405 | \n",
" 3 | \n",
" 216237 | \n",
" 203595.0 | \n",
" 228879.0 | \n",
" 324 | \n",
" 305.0 | \n",
" 343.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 3 | \n",
" 202404 | \n",
" 3 | \n",
" 213196 | \n",
" 200547.0 | \n",
" 225845.0 | \n",
" 320 | \n",
" 301.0 | \n",
" 339.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 4 | \n",
" 202403 | \n",
" 3 | \n",
" 163457 | \n",
" 152276.0 | \n",
" 174638.0 | \n",
" 245 | \n",
" 228.0 | \n",
" 262.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 5 | \n",
" 202402 | \n",
" 3 | \n",
" 129436 | \n",
" 119453.0 | \n",
" 139419.0 | \n",
" 194 | \n",
" 179.0 | \n",
" 209.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 6 | \n",
" 202401 | \n",
" 3 | \n",
" 120769 | \n",
" 109452.0 | \n",
" 132086.0 | \n",
" 181 | \n",
" 164.0 | \n",
" 198.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 7 | \n",
" 202352 | \n",
" 3 | \n",
" 115446 | \n",
" 103738.0 | \n",
" 127154.0 | \n",
" 174 | \n",
" 156.0 | \n",
" 192.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 8 | \n",
" 202351 | \n",
" 3 | \n",
" 148755 | \n",
" 136546.0 | \n",
" 160964.0 | \n",
" 224 | \n",
" 206.0 | \n",
" 242.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 9 | \n",
" 202350 | \n",
" 3 | \n",
" 147971 | \n",
" 136787.0 | \n",
" 159155.0 | \n",
" 223 | \n",
" 206.0 | \n",
" 240.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 10 | \n",
" 202349 | \n",
" 3 | \n",
" 147552 | \n",
" 136422.0 | \n",
" 158682.0 | \n",
" 222 | \n",
" 205.0 | \n",
" 239.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 11 | \n",
" 202348 | \n",
" 3 | \n",
" 124204 | \n",
" 113479.0 | \n",
" 134929.0 | \n",
" 187 | \n",
" 171.0 | \n",
" 203.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 12 | \n",
" 202347 | \n",
" 3 | \n",
" 110910 | \n",
" 100658.0 | \n",
" 121162.0 | \n",
" 167 | \n",
" 152.0 | \n",
" 182.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 13 | \n",
" 202346 | \n",
" 3 | \n",
" 83853 | \n",
" 75096.0 | \n",
" 92610.0 | \n",
" 126 | \n",
" 113.0 | \n",
" 139.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 14 | \n",
" 202345 | \n",
" 3 | \n",
" 72003 | \n",
" 63178.0 | \n",
" 80828.0 | \n",
" 108 | \n",
" 95.0 | \n",
" 121.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 15 | \n",
" 202344 | \n",
" 3 | \n",
" 49952 | \n",
" 42813.0 | \n",
" 57091.0 | \n",
" 75 | \n",
" 64.0 | \n",
" 86.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 16 | \n",
" 202343 | \n",
" 3 | \n",
" 44982 | \n",
" 38170.0 | \n",
" 51794.0 | \n",
" 68 | \n",
" 58.0 | \n",
" 78.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 17 | \n",
" 202342 | \n",
" 3 | \n",
" 56842 | \n",
" 49277.0 | \n",
" 64407.0 | \n",
" 86 | \n",
" 75.0 | \n",
" 97.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 18 | \n",
" 202341 | \n",
" 3 | \n",
" 58357 | \n",
" 51032.0 | \n",
" 65682.0 | \n",
" 88 | \n",
" 77.0 | \n",
" 99.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 19 | \n",
" 202340 | \n",
" 3 | \n",
" 68894 | \n",
" 60069.0 | \n",
" 77719.0 | \n",
" 104 | \n",
" 91.0 | \n",
" 117.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 20 | \n",
" 202339 | \n",
" 3 | \n",
" 72003 | \n",
" 63452.0 | \n",
" 80554.0 | \n",
" 108 | \n",
" 95.0 | \n",
" 121.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 21 | \n",
" 202338 | \n",
" 3 | \n",
" 63218 | \n",
" 55227.0 | \n",
" 71209.0 | \n",
" 95 | \n",
" 83.0 | \n",
" 107.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 22 | \n",
" 202337 | \n",
" 3 | \n",
" 49085 | \n",
" 42079.0 | \n",
" 56091.0 | \n",
" 74 | \n",
" 63.0 | \n",
" 85.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 23 | \n",
" 202336 | \n",
" 3 | \n",
" 38247 | \n",
" 32237.0 | \n",
" 44257.0 | \n",
" 58 | \n",
" 49.0 | \n",
" 67.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 24 | \n",
" 202335 | \n",
" 3 | \n",
" 31695 | \n",
" 26013.0 | \n",
" 37377.0 | \n",
" 48 | \n",
" 39.0 | \n",
" 57.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 25 | \n",
" 202334 | \n",
" 3 | \n",
" 26663 | \n",
" 21057.0 | \n",
" 32269.0 | \n",
" 40 | \n",
" 32.0 | \n",
" 48.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 26 | \n",
" 202333 | \n",
" 3 | \n",
" 19144 | \n",
" 13161.0 | \n",
" 25127.0 | \n",
" 29 | \n",
" 20.0 | \n",
" 38.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 27 | \n",
" 202332 | \n",
" 3 | \n",
" 14641 | \n",
" 10285.0 | \n",
" 18997.0 | \n",
" 22 | \n",
" 15.0 | \n",
" 29.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 28 | \n",
" 202331 | \n",
" 3 | \n",
" 15286 | \n",
" 10705.0 | \n",
" 19867.0 | \n",
" 23 | \n",
" 16.0 | \n",
" 30.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 29 | \n",
" 202330 | \n",
" 3 | \n",
" 13205 | \n",
" 8647.0 | \n",
" 17763.0 | \n",
" 20 | \n",
" 13.0 | \n",
" 27.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2021 | \n",
" 198521 | \n",
" 3 | \n",
" 26096 | \n",
" 19621.0 | \n",
" 32571.0 | \n",
" 47 | \n",
" 35.0 | \n",
" 59.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2022 | \n",
" 198520 | \n",
" 3 | \n",
" 27896 | \n",
" 20885.0 | \n",
" 34907.0 | \n",
" 51 | \n",
" 38.0 | \n",
" 64.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2023 | \n",
" 198519 | \n",
" 3 | \n",
" 43154 | \n",
" 32821.0 | \n",
" 53487.0 | \n",
" 78 | \n",
" 59.0 | \n",
" 97.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2024 | \n",
" 198518 | \n",
" 3 | \n",
" 40555 | \n",
" 29935.0 | \n",
" 51175.0 | \n",
" 74 | \n",
" 55.0 | \n",
" 93.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2025 | \n",
" 198517 | \n",
" 3 | \n",
" 34053 | \n",
" 24366.0 | \n",
" 43740.0 | \n",
" 62 | \n",
" 44.0 | \n",
" 80.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2026 | \n",
" 198516 | \n",
" 3 | \n",
" 50362 | \n",
" 36451.0 | \n",
" 64273.0 | \n",
" 91 | \n",
" 66.0 | \n",
" 116.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2027 | \n",
" 198515 | \n",
" 3 | \n",
" 63881 | \n",
" 45538.0 | \n",
" 82224.0 | \n",
" 116 | \n",
" 83.0 | \n",
" 149.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2028 | \n",
" 198514 | \n",
" 3 | \n",
" 134545 | \n",
" 114400.0 | \n",
" 154690.0 | \n",
" 244 | \n",
" 207.0 | \n",
" 281.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2029 | \n",
" 198513 | \n",
" 3 | \n",
" 197206 | \n",
" 176080.0 | \n",
" 218332.0 | \n",
" 357 | \n",
" 319.0 | \n",
" 395.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2030 | \n",
" 198512 | \n",
" 3 | \n",
" 245240 | \n",
" 223304.0 | \n",
" 267176.0 | \n",
" 445 | \n",
" 405.0 | \n",
" 485.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2031 | \n",
" 198511 | \n",
" 3 | \n",
" 276205 | \n",
" 252399.0 | \n",
" 300011.0 | \n",
" 501 | \n",
" 458.0 | \n",
" 544.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2032 | \n",
" 198510 | \n",
" 3 | \n",
" 353231 | \n",
" 326279.0 | \n",
" 380183.0 | \n",
" 640 | \n",
" 591.0 | \n",
" 689.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2033 | \n",
" 198509 | \n",
" 3 | \n",
" 369895 | \n",
" 341109.0 | \n",
" 398681.0 | \n",
" 670 | \n",
" 618.0 | \n",
" 722.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2034 | \n",
" 198508 | \n",
" 3 | \n",
" 389886 | \n",
" 359529.0 | \n",
" 420243.0 | \n",
" 707 | \n",
" 652.0 | \n",
" 762.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2035 | \n",
" 198507 | \n",
" 3 | \n",
" 471852 | \n",
" 432599.0 | \n",
" 511105.0 | \n",
" 855 | \n",
" 784.0 | \n",
" 926.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2036 | \n",
" 198506 | \n",
" 3 | \n",
" 565825 | \n",
" 518011.0 | \n",
" 613639.0 | \n",
" 1026 | \n",
" 939.0 | \n",
" 1113.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2037 | \n",
" 198505 | \n",
" 3 | \n",
" 637302 | \n",
" 592795.0 | \n",
" 681809.0 | \n",
" 1155 | \n",
" 1074.0 | \n",
" 1236.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2038 | \n",
" 198504 | \n",
" 3 | \n",
" 424937 | \n",
" 390794.0 | \n",
" 459080.0 | \n",
" 770 | \n",
" 708.0 | \n",
" 832.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2039 | \n",
" 198503 | \n",
" 3 | \n",
" 213901 | \n",
" 174689.0 | \n",
" 253113.0 | \n",
" 388 | \n",
" 317.0 | \n",
" 459.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2040 | \n",
" 198502 | \n",
" 3 | \n",
" 97586 | \n",
" 80949.0 | \n",
" 114223.0 | \n",
" 177 | \n",
" 147.0 | \n",
" 207.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2041 | \n",
" 198501 | \n",
" 3 | \n",
" 85489 | \n",
" 65918.0 | \n",
" 105060.0 | \n",
" 155 | \n",
" 120.0 | \n",
" 190.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2042 | \n",
" 198452 | \n",
" 3 | \n",
" 84830 | \n",
" 60602.0 | \n",
" 109058.0 | \n",
" 154 | \n",
" 110.0 | \n",
" 198.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2043 | \n",
" 198451 | \n",
" 3 | \n",
" 101726 | \n",
" 80242.0 | \n",
" 123210.0 | \n",
" 185 | \n",
" 146.0 | \n",
" 224.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2044 | \n",
" 198450 | \n",
" 3 | \n",
" 123680 | \n",
" 101401.0 | \n",
" 145959.0 | \n",
" 225 | \n",
" 184.0 | \n",
" 266.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2045 | \n",
" 198449 | \n",
" 3 | \n",
" 101073 | \n",
" 81684.0 | \n",
" 120462.0 | \n",
" 184 | \n",
" 149.0 | \n",
" 219.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2046 | \n",
" 198448 | \n",
" 3 | \n",
" 78620 | \n",
" 60634.0 | \n",
" 96606.0 | \n",
" 143 | \n",
" 110.0 | \n",
" 176.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2047 | \n",
" 198447 | \n",
" 3 | \n",
" 72029 | \n",
" 54274.0 | \n",
" 89784.0 | \n",
" 131 | \n",
" 99.0 | \n",
" 163.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2048 | \n",
" 198446 | \n",
" 3 | \n",
" 87330 | \n",
" 67686.0 | \n",
" 106974.0 | \n",
" 159 | \n",
" 123.0 | \n",
" 195.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2049 | \n",
" 198445 | \n",
" 3 | \n",
" 135223 | \n",
" 101414.0 | \n",
" 169032.0 | \n",
" 246 | \n",
" 184.0 | \n",
" 308.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
" 2050 | \n",
" 198444 | \n",
" 3 | \n",
" 68422 | \n",
" 20056.0 | \n",
" 116788.0 | \n",
" 125 | \n",
" 37.0 | \n",
" 213.0 | \n",
" FR | \n",
" France | \n",
"
\n",
" \n",
"
\n",
"
2050 rows × 10 columns
\n",
"
"
],
"text/plain": [
" week indicator inc inc_low inc_up inc100 inc100_low \\\n",
"0 202407 3 163687 148996.0 178378.0 245 223.0 \n",
"1 202406 3 191550 179309.0 203791.0 287 269.0 \n",
"2 202405 3 216237 203595.0 228879.0 324 305.0 \n",
"3 202404 3 213196 200547.0 225845.0 320 301.0 \n",
"4 202403 3 163457 152276.0 174638.0 245 228.0 \n",
"5 202402 3 129436 119453.0 139419.0 194 179.0 \n",
"6 202401 3 120769 109452.0 132086.0 181 164.0 \n",
"7 202352 3 115446 103738.0 127154.0 174 156.0 \n",
"8 202351 3 148755 136546.0 160964.0 224 206.0 \n",
"9 202350 3 147971 136787.0 159155.0 223 206.0 \n",
"10 202349 3 147552 136422.0 158682.0 222 205.0 \n",
"11 202348 3 124204 113479.0 134929.0 187 171.0 \n",
"12 202347 3 110910 100658.0 121162.0 167 152.0 \n",
"13 202346 3 83853 75096.0 92610.0 126 113.0 \n",
"14 202345 3 72003 63178.0 80828.0 108 95.0 \n",
"15 202344 3 49952 42813.0 57091.0 75 64.0 \n",
"16 202343 3 44982 38170.0 51794.0 68 58.0 \n",
"17 202342 3 56842 49277.0 64407.0 86 75.0 \n",
"18 202341 3 58357 51032.0 65682.0 88 77.0 \n",
"19 202340 3 68894 60069.0 77719.0 104 91.0 \n",
"20 202339 3 72003 63452.0 80554.0 108 95.0 \n",
"21 202338 3 63218 55227.0 71209.0 95 83.0 \n",
"22 202337 3 49085 42079.0 56091.0 74 63.0 \n",
"23 202336 3 38247 32237.0 44257.0 58 49.0 \n",
"24 202335 3 31695 26013.0 37377.0 48 39.0 \n",
"25 202334 3 26663 21057.0 32269.0 40 32.0 \n",
"26 202333 3 19144 13161.0 25127.0 29 20.0 \n",
"27 202332 3 14641 10285.0 18997.0 22 15.0 \n",
"28 202331 3 15286 10705.0 19867.0 23 16.0 \n",
"29 202330 3 13205 8647.0 17763.0 20 13.0 \n",
"... ... ... ... ... ... ... ... \n",
"2021 198521 3 26096 19621.0 32571.0 47 35.0 \n",
"2022 198520 3 27896 20885.0 34907.0 51 38.0 \n",
"2023 198519 3 43154 32821.0 53487.0 78 59.0 \n",
"2024 198518 3 40555 29935.0 51175.0 74 55.0 \n",
"2025 198517 3 34053 24366.0 43740.0 62 44.0 \n",
"2026 198516 3 50362 36451.0 64273.0 91 66.0 \n",
"2027 198515 3 63881 45538.0 82224.0 116 83.0 \n",
"2028 198514 3 134545 114400.0 154690.0 244 207.0 \n",
"2029 198513 3 197206 176080.0 218332.0 357 319.0 \n",
"2030 198512 3 245240 223304.0 267176.0 445 405.0 \n",
"2031 198511 3 276205 252399.0 300011.0 501 458.0 \n",
"2032 198510 3 353231 326279.0 380183.0 640 591.0 \n",
"2033 198509 3 369895 341109.0 398681.0 670 618.0 \n",
"2034 198508 3 389886 359529.0 420243.0 707 652.0 \n",
"2035 198507 3 471852 432599.0 511105.0 855 784.0 \n",
"2036 198506 3 565825 518011.0 613639.0 1026 939.0 \n",
"2037 198505 3 637302 592795.0 681809.0 1155 1074.0 \n",
"2038 198504 3 424937 390794.0 459080.0 770 708.0 \n",
"2039 198503 3 213901 174689.0 253113.0 388 317.0 \n",
"2040 198502 3 97586 80949.0 114223.0 177 147.0 \n",
"2041 198501 3 85489 65918.0 105060.0 155 120.0 \n",
"2042 198452 3 84830 60602.0 109058.0 154 110.0 \n",
"2043 198451 3 101726 80242.0 123210.0 185 146.0 \n",
"2044 198450 3 123680 101401.0 145959.0 225 184.0 \n",
"2045 198449 3 101073 81684.0 120462.0 184 149.0 \n",
"2046 198448 3 78620 60634.0 96606.0 143 110.0 \n",
"2047 198447 3 72029 54274.0 89784.0 131 99.0 \n",
"2048 198446 3 87330 67686.0 106974.0 159 123.0 \n",
"2049 198445 3 135223 101414.0 169032.0 246 184.0 \n",
"2050 198444 3 68422 20056.0 116788.0 125 37.0 \n",
"\n",
" inc100_up geo_insee geo_name \n",
"0 267.0 FR France \n",
"1 305.0 FR France \n",
"2 343.0 FR France \n",
"3 339.0 FR France \n",
"4 262.0 FR France \n",
"5 209.0 FR France \n",
"6 198.0 FR France \n",
"7 192.0 FR France \n",
"8 242.0 FR France \n",
"9 240.0 FR France \n",
"10 239.0 FR France \n",
"11 203.0 FR France \n",
"12 182.0 FR France \n",
"13 139.0 FR France \n",
"14 121.0 FR France \n",
"15 86.0 FR France \n",
"16 78.0 FR France \n",
"17 97.0 FR France \n",
"18 99.0 FR France \n",
"19 117.0 FR France \n",
"20 121.0 FR France \n",
"21 107.0 FR France \n",
"22 85.0 FR France \n",
"23 67.0 FR France \n",
"24 57.0 FR France \n",
"25 48.0 FR France \n",
"26 38.0 FR France \n",
"27 29.0 FR France \n",
"28 30.0 FR France \n",
"29 27.0 FR France \n",
"... ... ... ... \n",
"2021 59.0 FR France \n",
"2022 64.0 FR France \n",
"2023 97.0 FR France \n",
"2024 93.0 FR France \n",
"2025 80.0 FR France \n",
"2026 116.0 FR France \n",
"2027 149.0 FR France \n",
"2028 281.0 FR France \n",
"2029 395.0 FR France \n",
"2030 485.0 FR France \n",
"2031 544.0 FR France \n",
"2032 689.0 FR France \n",
"2033 722.0 FR France \n",
"2034 762.0 FR France \n",
"2035 926.0 FR France \n",
"2036 1113.0 FR France \n",
"2037 1236.0 FR France \n",
"2038 832.0 FR France \n",
"2039 459.0 FR France \n",
"2040 207.0 FR France \n",
"2041 190.0 FR France \n",
"2042 198.0 FR France \n",
"2043 224.0 FR France \n",
"2044 266.0 FR France \n",
"2045 219.0 FR France \n",
"2046 176.0 FR France \n",
"2047 163.0 FR France \n",
"2048 195.0 FR France \n",
"2049 308.0 FR France \n",
"2050 213.0 FR France \n",
"\n",
"[2050 rows x 10 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = raw_data.dropna().copy()\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our dataset uses an uncommon encoding; the week number is attached\n",
"to the year number, leaving the impression of a six-digit integer.\n",
"That is how Pandas interprets it.\n",
"\n",
"A second problem is that Pandas does not know about week numbers.\n",
"It needs to be given the dates of the beginning and end of the week.\n",
"We use the library `isoweek` for that.\n",
"\n",
"Since the conversion is a bit lengthy, we write a small Python \n",
"function for doing it. Then we apply it to all points in our dataset. \n",
"The results go into a new column 'period'."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def convert_week(year_and_week_int):\n",
" year_and_week_str = str(year_and_week_int)\n",
" year = int(year_and_week_str[:4])\n",
" week = int(year_and_week_str[4:])\n",
" w = isoweek.Week(year, week)\n",
" return pd.Period(w.day(0), 'W')\n",
"\n",
"data['period'] = [convert_week(yw) for yw in data['week']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two more small changes to make.\n",
"\n",
"First, we define the observation periods as the new index of\n",
"our dataset. That turns it into a time series, which will be\n",
"convenient later on.\n",
"\n",
"Second, we sort the points chronologically."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sorted_data = data.set_index('period').sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We check the consistency of the data. Between the end of a period and\n",
"the beginning of the next one, the difference should be zero, or very small.\n",
"We tolerate an error of one second.\n",
"\n",
"This is OK except for one pair of consecutive periods between which\n",
"a whole week is missing.\n",
"\n",
"We recognize the dates: it's the week without observations that we\n",
"have deleted earlier!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"periods = sorted_data.index\n",
"for p1, p2 in zip(periods[:-1], periods[1:]):\n",
" delta = p2.to_timestamp() - p1.end_time\n",
" if delta > pd.Timedelta('1s'):\n",
" print(p1, p2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A first look at the data!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted_data['inc'].plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A zoom on the last few years shows more clearly that the peaks are situated in winter."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted_data['inc'][-200:].plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Study of the annual incidence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the peaks of the epidemic happen in winter, near the transition\n",
"between calendar years, we define the reference period for the annual\n",
"incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n",
"label this period as year $N+1$ because the peak is always located in\n",
"year $N+1$. The very low incidence in summer ensures that the arbitrariness\n",
"of the choice of reference period has no impact on our conclusions.\n",
"\n",
"Our task is a bit complicated by the fact that a year does not have an\n",
"integer number of weeks. Therefore we modify our reference period a bit:\n",
"instead of August 1st, we use the first day of the week containing August 1st.\n",
"\n",
"A final detail: the dataset starts in October 1984, the first peak is thus\n",
"incomplete, We start the analysis with the first full peak."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n",
" for y in range(1985,\n",
" sorted_data.index[-1].year)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n",
"\n",
"We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"year = []\n",
"yearly_incidence = []\n",
"for week1, week2 in zip(first_august_week[:-1],\n",
" first_august_week[1:]):\n",
" one_year = sorted_data['inc'][week1:week2-1]\n",
" assert abs(len(one_year)-52) < 2\n",
" yearly_incidence.append(one_year.sum())\n",
" year.append(week2.year)\n",
"yearly_incidence = pd.Series(data=yearly_incidence, index=year)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here are the annual incidences."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"yearly_incidence.plot(style='*')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A sorted list makes it easier to find the highest values (at the end)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"yearly_incidence.sort_values()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n",
"but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"yearly_incidence.hist(xrot=20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}