{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of influenza-like illness in France" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import os\n", "import isoweek\n", "import requests\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"\n", "local_file_path = \"./incidence-PAY-3.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`. \n", "We download the `.csv` file if not already done" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
020213132375216292.031212.03625.047.0FRFrance
12021303141429763.018521.02114.028.0FRFrance
22021293136269618.017634.02115.027.0FRFrance
3202128386365430.011842.0138.018.0FRFrance
42021273106936838.014548.01610.022.0FRFrance
5202126370864109.010063.0116.016.0FRFrance
6202125379425540.010344.0128.016.0FRFrance
7202124348553011.06699.074.010.0FRFrance
8202123367104455.08965.0107.013.0FRFrance
9202122378795495.010263.0128.016.0FRFrance
10202121378275403.010251.0128.016.0FRFrance
112021203102787540.013016.01612.020.0FRFrance
12202119395396860.012218.01410.018.0FRFrance
132021183121359165.015105.01814.022.0FRFrance
142021173120588891.015225.01813.023.0FRFrance
1520211631650512735.020275.02519.031.0FRFrance
1620211531930615398.023214.02923.035.0FRFrance
1720211432107317099.025047.03226.038.0FRFrance
1820211332641322094.030732.04033.047.0FRFrance
1920211233065825919.035397.04639.053.0FRFrance
2020211132498820718.029258.03832.044.0FRFrance
2120211031953915951.023127.03025.035.0FRFrance
2220210931757213926.021218.02721.033.0FRFrance
2320210832088216907.024857.03226.038.0FRFrance
2420210732239318303.026483.03428.040.0FRFrance
2520210632318319134.027232.03529.041.0FRFrance
2620210532242618445.026407.03428.040.0FRFrance
2720210432580421491.030117.03932.046.0FRFrance
2820210332181017894.025726.03327.039.0FRFrance
2920210231732013906.020734.02621.031.0FRFrance
.................................
188919852132609619621.032571.04735.059.0FRFrance
189019852032789620885.034907.05138.064.0FRFrance
189119851934315432821.053487.07859.097.0FRFrance
189219851834055529935.051175.07455.093.0FRFrance
189319851733405324366.043740.06244.080.0FRFrance
189419851635036236451.064273.09166.0116.0FRFrance
189519851536388145538.082224.011683.0149.0FRFrance
18961985143134545114400.0154690.0244207.0281.0FRFrance
18971985133197206176080.0218332.0357319.0395.0FRFrance
18981985123245240223304.0267176.0445405.0485.0FRFrance
18991985113276205252399.0300011.0501458.0544.0FRFrance
19001985103353231326279.0380183.0640591.0689.0FRFrance
19011985093369895341109.0398681.0670618.0722.0FRFrance
19021985083389886359529.0420243.0707652.0762.0FRFrance
19031985073471852432599.0511105.0855784.0926.0FRFrance
19041985063565825518011.0613639.01026939.01113.0FRFrance
19051985053637302592795.0681809.011551074.01236.0FRFrance
19061985043424937390794.0459080.0770708.0832.0FRFrance
19071985033213901174689.0253113.0388317.0459.0FRFrance
190819850239758680949.0114223.0177147.0207.0FRFrance
190919850138548965918.0105060.0155120.0190.0FRFrance
191019845238483060602.0109058.0154110.0198.0FRFrance
1911198451310172680242.0123210.0185146.0224.0FRFrance
19121984503123680101401.0145959.0225184.0266.0FRFrance
1913198449310107381684.0120462.0184149.0219.0FRFrance
191419844837862060634.096606.0143110.0176.0FRFrance
191519844737202954274.089784.013199.0163.0FRFrance
191619844638733067686.0106974.0159123.0195.0FRFrance
19171984453135223101414.0169032.0246184.0308.0FRFrance
191819844436842220056.0116788.012537.0213.0FRFrance
\n", "

1919 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202131 3 23752 16292.0 31212.0 36 25.0 \n", "1 202130 3 14142 9763.0 18521.0 21 14.0 \n", "2 202129 3 13626 9618.0 17634.0 21 15.0 \n", "3 202128 3 8636 5430.0 11842.0 13 8.0 \n", "4 202127 3 10693 6838.0 14548.0 16 10.0 \n", "5 202126 3 7086 4109.0 10063.0 11 6.0 \n", "6 202125 3 7942 5540.0 10344.0 12 8.0 \n", "7 202124 3 4855 3011.0 6699.0 7 4.0 \n", "8 202123 3 6710 4455.0 8965.0 10 7.0 \n", "9 202122 3 7879 5495.0 10263.0 12 8.0 \n", "10 202121 3 7827 5403.0 10251.0 12 8.0 \n", "11 202120 3 10278 7540.0 13016.0 16 12.0 \n", "12 202119 3 9539 6860.0 12218.0 14 10.0 \n", "13 202118 3 12135 9165.0 15105.0 18 14.0 \n", "14 202117 3 12058 8891.0 15225.0 18 13.0 \n", "15 202116 3 16505 12735.0 20275.0 25 19.0 \n", "16 202115 3 19306 15398.0 23214.0 29 23.0 \n", "17 202114 3 21073 17099.0 25047.0 32 26.0 \n", "18 202113 3 26413 22094.0 30732.0 40 33.0 \n", "19 202112 3 30658 25919.0 35397.0 46 39.0 \n", "20 202111 3 24988 20718.0 29258.0 38 32.0 \n", "21 202110 3 19539 15951.0 23127.0 30 25.0 \n", "22 202109 3 17572 13926.0 21218.0 27 21.0 \n", "23 202108 3 20882 16907.0 24857.0 32 26.0 \n", "24 202107 3 22393 18303.0 26483.0 34 28.0 \n", "25 202106 3 23183 19134.0 27232.0 35 29.0 \n", "26 202105 3 22426 18445.0 26407.0 34 28.0 \n", "27 202104 3 25804 21491.0 30117.0 39 32.0 \n", "28 202103 3 21810 17894.0 25726.0 33 27.0 \n", "29 202102 3 17320 13906.0 20734.0 26 21.0 \n", "... ... ... ... ... ... ... ... \n", "1889 198521 3 26096 19621.0 32571.0 47 35.0 \n", "1890 198520 3 27896 20885.0 34907.0 51 38.0 \n", "1891 198519 3 43154 32821.0 53487.0 78 59.0 \n", "1892 198518 3 40555 29935.0 51175.0 74 55.0 \n", "1893 198517 3 34053 24366.0 43740.0 62 44.0 \n", "1894 198516 3 50362 36451.0 64273.0 91 66.0 \n", "1895 198515 3 63881 45538.0 82224.0 116 83.0 \n", "1896 198514 3 134545 114400.0 154690.0 244 207.0 \n", "1897 198513 3 197206 176080.0 218332.0 357 319.0 \n", "1898 198512 3 245240 223304.0 267176.0 445 405.0 \n", "1899 198511 3 276205 252399.0 300011.0 501 458.0 \n", "1900 198510 3 353231 326279.0 380183.0 640 591.0 \n", "1901 198509 3 369895 341109.0 398681.0 670 618.0 \n", "1902 198508 3 389886 359529.0 420243.0 707 652.0 \n", "1903 198507 3 471852 432599.0 511105.0 855 784.0 \n", "1904 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "1905 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "1906 198504 3 424937 390794.0 459080.0 770 708.0 \n", "1907 198503 3 213901 174689.0 253113.0 388 317.0 \n", "1908 198502 3 97586 80949.0 114223.0 177 147.0 \n", "1909 198501 3 85489 65918.0 105060.0 155 120.0 \n", "1910 198452 3 84830 60602.0 109058.0 154 110.0 \n", "1911 198451 3 101726 80242.0 123210.0 185 146.0 \n", "1912 198450 3 123680 101401.0 145959.0 225 184.0 \n", "1913 198449 3 101073 81684.0 120462.0 184 149.0 \n", "1914 198448 3 78620 60634.0 96606.0 143 110.0 \n", "1915 198447 3 72029 54274.0 89784.0 131 99.0 \n", "1916 198446 3 87330 67686.0 106974.0 159 123.0 \n", "1917 198445 3 135223 101414.0 169032.0 246 184.0 \n", "1918 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 47.0 FR France \n", "1 28.0 FR France \n", "2 27.0 FR France \n", "3 18.0 FR France \n", "4 22.0 FR France \n", "5 16.0 FR France \n", "6 16.0 FR France \n", "7 10.0 FR France \n", "8 13.0 FR France \n", "9 16.0 FR France \n", "10 16.0 FR France \n", "11 20.0 FR France \n", "12 18.0 FR France \n", "13 22.0 FR France \n", "14 23.0 FR France \n", "15 31.0 FR France \n", "16 35.0 FR France \n", "17 38.0 FR France \n", "18 47.0 FR France \n", "19 53.0 FR France \n", "20 44.0 FR France \n", "21 35.0 FR France \n", "22 33.0 FR France \n", "23 38.0 FR France \n", "24 40.0 FR France \n", "25 41.0 FR France \n", "26 40.0 FR France \n", "27 46.0 FR France \n", "28 39.0 FR France \n", "29 31.0 FR France \n", "... ... ... ... \n", "1889 59.0 FR France \n", "1890 64.0 FR France \n", "1891 97.0 FR France \n", "1892 93.0 FR France \n", "1893 80.0 FR France \n", "1894 116.0 FR France \n", "1895 149.0 FR France \n", "1896 281.0 FR France \n", "1897 395.0 FR France \n", "1898 485.0 FR France \n", "1899 544.0 FR France \n", "1900 689.0 FR France \n", "1901 722.0 FR France \n", "1902 762.0 FR France \n", "1903 926.0 FR France \n", "1904 1113.0 FR France \n", "1905 1236.0 FR France \n", "1906 832.0 FR France \n", "1907 459.0 FR France \n", "1908 207.0 FR France \n", "1909 190.0 FR France \n", "1910 198.0 FR France \n", "1911 224.0 FR France \n", "1912 266.0 FR France \n", "1913 219.0 FR France \n", "1914 176.0 FR France \n", "1915 163.0 FR France \n", "1916 195.0 FR France \n", "1917 308.0 FR France \n", "1918 213.0 FR France \n", "\n", "[1919 rows x 10 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if not os.path.isfile(local_file_path):\n", " r = requests.get(data_url, allow_redirects=True)\n", " open(local_file_path, 'wb').write(r.content)\n", " \n", "raw_data = pd.read_csv(local_file_path, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? Yes, week 19 of year 1989 does not have any observed values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We delete this point, which does not have big consequence for our rather simple analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second.\n", "\n", "This is OK except for one pair of consecutive periods between which\n", "a whole week is missing.\n", "\n", "We recognize the dates: it's the week without observations that we\n", "have deleted earlier!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A zoom on the last few years shows more clearly that the peaks are situated in winter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in winter, near the transition\n", "between calendar years, we define the reference period for the annual\n", "incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n", "label this period as year $N+1$ because the peak is always located in\n", "year $N+1$. The very low incidence in summer ensures that the arbitrariness\n", "of the choice of reference period has no impact on our conclusions.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an\n", "integer number of weeks. Therefore we modify our reference period a bit:\n", "instead of August 1st, we use the first day of the week containing August 1st.\n", "\n", "A final detail: the dataset starts in October 1984, the first peak is thus\n", "incomplete, We start the analysis with the first full peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n", "but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }