{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of influenza-like illness in France" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to protect us in case the Réseau Sentinelles Web server disappears or is modified, we make a local copy of this dataset that we store together with our analysis. It is unnecessary and even risky to download the data at each execution, because in case of a malfunction we might be replacing our file by a corrupted version. Therefore we download the data only if no local copy exists." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "data_file = \"syndrome-grippal.csv\"\n", "\n", "import os\n", "import urllib.request\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
02023283108295995.015663.0169.023.0FRFrance
1202327391975864.012530.0149.019.0FRFrance
2202326390235934.012112.0149.019.0FRFrance
32023253100906739.013441.01510.020.0FRFrance
42023243113087639.014977.01711.023.0FRFrance
520232331430010661.017939.02217.027.0FRFrance
620232231830313822.022784.02821.035.0FRFrance
720232131646012188.020732.02519.031.0FRFrance
820232031616211963.020361.02418.030.0FRFrance
920231931690112577.021225.02518.032.0FRFrance
1020231831992915402.024456.03023.037.0FRFrance
1120231732700721779.032235.04133.049.0FRFrance
1220231632787522767.032983.04234.050.0FRFrance
1320231533745530993.043917.05646.066.0FRFrance
1420231434806040671.055449.07261.083.0FRFrance
1520231336485956800.072918.09886.0110.0FRFrance
1620231237275064499.081001.010997.0121.0FRFrance
1720231137463866420.082856.0112100.0124.0FRFrance
1820231037636868243.084493.0115103.0127.0FRFrance
1920230936206254778.069346.09382.0104.0FRFrance
2020230837639168065.084717.0115102.0128.0FRFrance
2120230738985180397.099305.0135121.0149.0FRFrance
2220230639736887636.0107100.0146131.0161.0FRFrance
2320230539546986268.0104670.0144130.0158.0FRFrance
2420230437490166916.082886.0113101.0125.0FRFrance
2520230336957061893.077247.010593.0117.0FRFrance
2620230237826070090.086430.0118106.0130.0FRFrance
272023013121773111024.0132522.0183167.0199.0FRFrance
282022523155371142004.0168738.0234214.0254.0FRFrance
292022513248319232128.0264510.0374350.0398.0FRFrance
.................................
199019852132609619621.032571.04735.059.0FRFrance
199119852032789620885.034907.05138.064.0FRFrance
199219851934315432821.053487.07859.097.0FRFrance
199319851834055529935.051175.07455.093.0FRFrance
199419851733405324366.043740.06244.080.0FRFrance
199519851635036236451.064273.09166.0116.0FRFrance
199619851536388145538.082224.011683.0149.0FRFrance
19971985143134545114400.0154690.0244207.0281.0FRFrance
19981985133197206176080.0218332.0357319.0395.0FRFrance
19991985123245240223304.0267176.0445405.0485.0FRFrance
20001985113276205252399.0300011.0501458.0544.0FRFrance
20011985103353231326279.0380183.0640591.0689.0FRFrance
20021985093369895341109.0398681.0670618.0722.0FRFrance
20031985083389886359529.0420243.0707652.0762.0FRFrance
20041985073471852432599.0511105.0855784.0926.0FRFrance
20051985063565825518011.0613639.01026939.01113.0FRFrance
20061985053637302592795.0681809.011551074.01236.0FRFrance
20071985043424937390794.0459080.0770708.0832.0FRFrance
20081985033213901174689.0253113.0388317.0459.0FRFrance
200919850239758680949.0114223.0177147.0207.0FRFrance
201019850138548965918.0105060.0155120.0190.0FRFrance
201119845238483060602.0109058.0154110.0198.0FRFrance
2012198451310172680242.0123210.0185146.0224.0FRFrance
20131984503123680101401.0145959.0225184.0266.0FRFrance
2014198449310107381684.0120462.0184149.0219.0FRFrance
201519844837862060634.096606.0143110.0176.0FRFrance
201619844737202954274.089784.013199.0163.0FRFrance
201719844638733067686.0106974.0159123.0195.0FRFrance
20181984453135223101414.0169032.0246184.0308.0FRFrance
201919844436842220056.0116788.012537.0213.0FRFrance
\n", "

2020 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202328 3 10829 5995.0 15663.0 16 9.0 \n", "1 202327 3 9197 5864.0 12530.0 14 9.0 \n", "2 202326 3 9023 5934.0 12112.0 14 9.0 \n", "3 202325 3 10090 6739.0 13441.0 15 10.0 \n", "4 202324 3 11308 7639.0 14977.0 17 11.0 \n", "5 202323 3 14300 10661.0 17939.0 22 17.0 \n", "6 202322 3 18303 13822.0 22784.0 28 21.0 \n", "7 202321 3 16460 12188.0 20732.0 25 19.0 \n", "8 202320 3 16162 11963.0 20361.0 24 18.0 \n", "9 202319 3 16901 12577.0 21225.0 25 18.0 \n", "10 202318 3 19929 15402.0 24456.0 30 23.0 \n", "11 202317 3 27007 21779.0 32235.0 41 33.0 \n", "12 202316 3 27875 22767.0 32983.0 42 34.0 \n", "13 202315 3 37455 30993.0 43917.0 56 46.0 \n", "14 202314 3 48060 40671.0 55449.0 72 61.0 \n", "15 202313 3 64859 56800.0 72918.0 98 86.0 \n", "16 202312 3 72750 64499.0 81001.0 109 97.0 \n", "17 202311 3 74638 66420.0 82856.0 112 100.0 \n", "18 202310 3 76368 68243.0 84493.0 115 103.0 \n", "19 202309 3 62062 54778.0 69346.0 93 82.0 \n", "20 202308 3 76391 68065.0 84717.0 115 102.0 \n", "21 202307 3 89851 80397.0 99305.0 135 121.0 \n", "22 202306 3 97368 87636.0 107100.0 146 131.0 \n", "23 202305 3 95469 86268.0 104670.0 144 130.0 \n", "24 202304 3 74901 66916.0 82886.0 113 101.0 \n", "25 202303 3 69570 61893.0 77247.0 105 93.0 \n", "26 202302 3 78260 70090.0 86430.0 118 106.0 \n", "27 202301 3 121773 111024.0 132522.0 183 167.0 \n", "28 202252 3 155371 142004.0 168738.0 234 214.0 \n", "29 202251 3 248319 232128.0 264510.0 374 350.0 \n", "... ... ... ... ... ... ... ... \n", "1990 198521 3 26096 19621.0 32571.0 47 35.0 \n", "1991 198520 3 27896 20885.0 34907.0 51 38.0 \n", "1992 198519 3 43154 32821.0 53487.0 78 59.0 \n", "1993 198518 3 40555 29935.0 51175.0 74 55.0 \n", "1994 198517 3 34053 24366.0 43740.0 62 44.0 \n", "1995 198516 3 50362 36451.0 64273.0 91 66.0 \n", "1996 198515 3 63881 45538.0 82224.0 116 83.0 \n", "1997 198514 3 134545 114400.0 154690.0 244 207.0 \n", "1998 198513 3 197206 176080.0 218332.0 357 319.0 \n", "1999 198512 3 245240 223304.0 267176.0 445 405.0 \n", "2000 198511 3 276205 252399.0 300011.0 501 458.0 \n", "2001 198510 3 353231 326279.0 380183.0 640 591.0 \n", "2002 198509 3 369895 341109.0 398681.0 670 618.0 \n", "2003 198508 3 389886 359529.0 420243.0 707 652.0 \n", "2004 198507 3 471852 432599.0 511105.0 855 784.0 \n", "2005 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "2006 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "2007 198504 3 424937 390794.0 459080.0 770 708.0 \n", "2008 198503 3 213901 174689.0 253113.0 388 317.0 \n", "2009 198502 3 97586 80949.0 114223.0 177 147.0 \n", "2010 198501 3 85489 65918.0 105060.0 155 120.0 \n", "2011 198452 3 84830 60602.0 109058.0 154 110.0 \n", "2012 198451 3 101726 80242.0 123210.0 185 146.0 \n", "2013 198450 3 123680 101401.0 145959.0 225 184.0 \n", "2014 198449 3 101073 81684.0 120462.0 184 149.0 \n", "2015 198448 3 78620 60634.0 96606.0 143 110.0 \n", "2016 198447 3 72029 54274.0 89784.0 131 99.0 \n", "2017 198446 3 87330 67686.0 106974.0 159 123.0 \n", "2018 198445 3 135223 101414.0 169032.0 246 184.0 \n", "2019 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 23.0 FR France \n", "1 19.0 FR France \n", "2 19.0 FR France \n", "3 20.0 FR France \n", "4 23.0 FR France \n", "5 27.0 FR France \n", "6 35.0 FR France \n", "7 31.0 FR France \n", "8 30.0 FR France \n", "9 32.0 FR France \n", "10 37.0 FR France \n", "11 49.0 FR France \n", "12 50.0 FR France \n", "13 66.0 FR France \n", "14 83.0 FR France \n", "15 110.0 FR France \n", "16 121.0 FR France \n", "17 124.0 FR France \n", "18 127.0 FR France \n", "19 104.0 FR France \n", "20 128.0 FR France \n", "21 149.0 FR France \n", "22 161.0 FR France \n", "23 158.0 FR France \n", "24 125.0 FR France \n", "25 117.0 FR France \n", "26 130.0 FR France \n", "27 199.0 FR France \n", "28 254.0 FR France \n", "29 398.0 FR France \n", "... ... ... ... \n", "1990 59.0 FR France \n", "1991 64.0 FR France \n", "1992 97.0 FR France \n", "1993 93.0 FR France \n", "1994 80.0 FR France \n", "1995 116.0 FR France \n", "1996 149.0 FR France \n", "1997 281.0 FR France \n", "1998 395.0 FR France \n", "1999 485.0 FR France \n", "2000 544.0 FR France \n", "2001 689.0 FR France \n", "2002 722.0 FR France \n", "2003 762.0 FR France \n", "2004 926.0 FR France \n", "2005 1113.0 FR France \n", "2006 1236.0 FR France \n", "2007 832.0 FR France \n", "2008 459.0 FR France \n", "2009 207.0 FR France \n", "2010 190.0 FR France \n", "2011 198.0 FR France \n", "2012 224.0 FR France \n", "2013 266.0 FR France \n", "2014 219.0 FR France \n", "2015 176.0 FR France \n", "2016 163.0 FR France \n", "2017 195.0 FR France \n", "2018 308.0 FR France \n", "2019 213.0 FR France \n", "\n", "[2020 rows x 10 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? Yes, week 19 of year 1989 does not have any observed values." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
17831989193-NaNNaN-NaNNaNFRFrance
\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low inc100_up \\\n", "1783 198919 3 - NaN NaN - NaN NaN \n", "\n", " geo_insee geo_name \n", "1783 FR France " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We delete this point, which does not have big consequence for our rather simple analysis." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
02023283108295995.015663.0169.023.0FRFrance
1202327391975864.012530.0149.019.0FRFrance
2202326390235934.012112.0149.019.0FRFrance
32023253100906739.013441.01510.020.0FRFrance
42023243113087639.014977.01711.023.0FRFrance
520232331430010661.017939.02217.027.0FRFrance
620232231830313822.022784.02821.035.0FRFrance
720232131646012188.020732.02519.031.0FRFrance
820232031616211963.020361.02418.030.0FRFrance
920231931690112577.021225.02518.032.0FRFrance
1020231831992915402.024456.03023.037.0FRFrance
1120231732700721779.032235.04133.049.0FRFrance
1220231632787522767.032983.04234.050.0FRFrance
1320231533745530993.043917.05646.066.0FRFrance
1420231434806040671.055449.07261.083.0FRFrance
1520231336485956800.072918.09886.0110.0FRFrance
1620231237275064499.081001.010997.0121.0FRFrance
1720231137463866420.082856.0112100.0124.0FRFrance
1820231037636868243.084493.0115103.0127.0FRFrance
1920230936206254778.069346.09382.0104.0FRFrance
2020230837639168065.084717.0115102.0128.0FRFrance
2120230738985180397.099305.0135121.0149.0FRFrance
2220230639736887636.0107100.0146131.0161.0FRFrance
2320230539546986268.0104670.0144130.0158.0FRFrance
2420230437490166916.082886.0113101.0125.0FRFrance
2520230336957061893.077247.010593.0117.0FRFrance
2620230237826070090.086430.0118106.0130.0FRFrance
272023013121773111024.0132522.0183167.0199.0FRFrance
282022523155371142004.0168738.0234214.0254.0FRFrance
292022513248319232128.0264510.0374350.0398.0FRFrance
.................................
199019852132609619621.032571.04735.059.0FRFrance
199119852032789620885.034907.05138.064.0FRFrance
199219851934315432821.053487.07859.097.0FRFrance
199319851834055529935.051175.07455.093.0FRFrance
199419851733405324366.043740.06244.080.0FRFrance
199519851635036236451.064273.09166.0116.0FRFrance
199619851536388145538.082224.011683.0149.0FRFrance
19971985143134545114400.0154690.0244207.0281.0FRFrance
19981985133197206176080.0218332.0357319.0395.0FRFrance
19991985123245240223304.0267176.0445405.0485.0FRFrance
20001985113276205252399.0300011.0501458.0544.0FRFrance
20011985103353231326279.0380183.0640591.0689.0FRFrance
20021985093369895341109.0398681.0670618.0722.0FRFrance
20031985083389886359529.0420243.0707652.0762.0FRFrance
20041985073471852432599.0511105.0855784.0926.0FRFrance
20051985063565825518011.0613639.01026939.01113.0FRFrance
20061985053637302592795.0681809.011551074.01236.0FRFrance
20071985043424937390794.0459080.0770708.0832.0FRFrance
20081985033213901174689.0253113.0388317.0459.0FRFrance
200919850239758680949.0114223.0177147.0207.0FRFrance
201019850138548965918.0105060.0155120.0190.0FRFrance
201119845238483060602.0109058.0154110.0198.0FRFrance
2012198451310172680242.0123210.0185146.0224.0FRFrance
20131984503123680101401.0145959.0225184.0266.0FRFrance
2014198449310107381684.0120462.0184149.0219.0FRFrance
201519844837862060634.096606.0143110.0176.0FRFrance
201619844737202954274.089784.013199.0163.0FRFrance
201719844638733067686.0106974.0159123.0195.0FRFrance
20181984453135223101414.0169032.0246184.0308.0FRFrance
201919844436842220056.0116788.012537.0213.0FRFrance
\n", "

2019 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202328 3 10829 5995.0 15663.0 16 9.0 \n", "1 202327 3 9197 5864.0 12530.0 14 9.0 \n", "2 202326 3 9023 5934.0 12112.0 14 9.0 \n", "3 202325 3 10090 6739.0 13441.0 15 10.0 \n", "4 202324 3 11308 7639.0 14977.0 17 11.0 \n", "5 202323 3 14300 10661.0 17939.0 22 17.0 \n", "6 202322 3 18303 13822.0 22784.0 28 21.0 \n", "7 202321 3 16460 12188.0 20732.0 25 19.0 \n", "8 202320 3 16162 11963.0 20361.0 24 18.0 \n", "9 202319 3 16901 12577.0 21225.0 25 18.0 \n", "10 202318 3 19929 15402.0 24456.0 30 23.0 \n", "11 202317 3 27007 21779.0 32235.0 41 33.0 \n", "12 202316 3 27875 22767.0 32983.0 42 34.0 \n", "13 202315 3 37455 30993.0 43917.0 56 46.0 \n", "14 202314 3 48060 40671.0 55449.0 72 61.0 \n", "15 202313 3 64859 56800.0 72918.0 98 86.0 \n", "16 202312 3 72750 64499.0 81001.0 109 97.0 \n", "17 202311 3 74638 66420.0 82856.0 112 100.0 \n", "18 202310 3 76368 68243.0 84493.0 115 103.0 \n", "19 202309 3 62062 54778.0 69346.0 93 82.0 \n", "20 202308 3 76391 68065.0 84717.0 115 102.0 \n", "21 202307 3 89851 80397.0 99305.0 135 121.0 \n", "22 202306 3 97368 87636.0 107100.0 146 131.0 \n", "23 202305 3 95469 86268.0 104670.0 144 130.0 \n", "24 202304 3 74901 66916.0 82886.0 113 101.0 \n", "25 202303 3 69570 61893.0 77247.0 105 93.0 \n", "26 202302 3 78260 70090.0 86430.0 118 106.0 \n", "27 202301 3 121773 111024.0 132522.0 183 167.0 \n", "28 202252 3 155371 142004.0 168738.0 234 214.0 \n", "29 202251 3 248319 232128.0 264510.0 374 350.0 \n", "... ... ... ... ... ... ... ... \n", "1990 198521 3 26096 19621.0 32571.0 47 35.0 \n", "1991 198520 3 27896 20885.0 34907.0 51 38.0 \n", "1992 198519 3 43154 32821.0 53487.0 78 59.0 \n", "1993 198518 3 40555 29935.0 51175.0 74 55.0 \n", "1994 198517 3 34053 24366.0 43740.0 62 44.0 \n", "1995 198516 3 50362 36451.0 64273.0 91 66.0 \n", "1996 198515 3 63881 45538.0 82224.0 116 83.0 \n", "1997 198514 3 134545 114400.0 154690.0 244 207.0 \n", "1998 198513 3 197206 176080.0 218332.0 357 319.0 \n", "1999 198512 3 245240 223304.0 267176.0 445 405.0 \n", "2000 198511 3 276205 252399.0 300011.0 501 458.0 \n", "2001 198510 3 353231 326279.0 380183.0 640 591.0 \n", "2002 198509 3 369895 341109.0 398681.0 670 618.0 \n", "2003 198508 3 389886 359529.0 420243.0 707 652.0 \n", "2004 198507 3 471852 432599.0 511105.0 855 784.0 \n", "2005 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "2006 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "2007 198504 3 424937 390794.0 459080.0 770 708.0 \n", "2008 198503 3 213901 174689.0 253113.0 388 317.0 \n", "2009 198502 3 97586 80949.0 114223.0 177 147.0 \n", "2010 198501 3 85489 65918.0 105060.0 155 120.0 \n", "2011 198452 3 84830 60602.0 109058.0 154 110.0 \n", "2012 198451 3 101726 80242.0 123210.0 185 146.0 \n", "2013 198450 3 123680 101401.0 145959.0 225 184.0 \n", "2014 198449 3 101073 81684.0 120462.0 184 149.0 \n", "2015 198448 3 78620 60634.0 96606.0 143 110.0 \n", "2016 198447 3 72029 54274.0 89784.0 131 99.0 \n", "2017 198446 3 87330 67686.0 106974.0 159 123.0 \n", "2018 198445 3 135223 101414.0 169032.0 246 184.0 \n", "2019 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 23.0 FR France \n", "1 19.0 FR France \n", "2 19.0 FR France \n", "3 20.0 FR France \n", "4 23.0 FR France \n", "5 27.0 FR France \n", "6 35.0 FR France \n", "7 31.0 FR France \n", "8 30.0 FR France \n", "9 32.0 FR France \n", "10 37.0 FR France \n", "11 49.0 FR France \n", "12 50.0 FR France \n", "13 66.0 FR France \n", "14 83.0 FR France \n", "15 110.0 FR France \n", "16 121.0 FR France \n", "17 124.0 FR France \n", "18 127.0 FR France \n", "19 104.0 FR France \n", "20 128.0 FR France \n", "21 149.0 FR France \n", "22 161.0 FR France \n", "23 158.0 FR France \n", "24 125.0 FR France \n", "25 117.0 FR France \n", "26 130.0 FR France \n", "27 199.0 FR France \n", "28 254.0 FR France \n", "29 398.0 FR France \n", "... ... ... ... \n", "1990 59.0 FR France \n", "1991 64.0 FR France \n", "1992 97.0 FR France \n", "1993 93.0 FR France \n", "1994 80.0 FR France \n", "1995 116.0 FR France \n", "1996 149.0 FR France \n", "1997 281.0 FR France \n", "1998 395.0 FR France \n", "1999 485.0 FR France \n", "2000 544.0 FR France \n", "2001 689.0 FR France \n", "2002 722.0 FR France \n", "2003 762.0 FR France \n", "2004 926.0 FR France \n", "2005 1113.0 FR France \n", "2006 1236.0 FR France \n", "2007 832.0 FR France \n", "2008 459.0 FR France \n", "2009 207.0 FR France \n", "2010 190.0 FR France \n", "2011 198.0 FR France \n", "2012 224.0 FR France \n", "2013 266.0 FR France \n", "2014 219.0 FR France \n", "2015 176.0 FR France \n", "2016 163.0 FR France \n", "2017 195.0 FR France \n", "2018 308.0 FR France \n", "2019 213.0 FR France \n", "\n", "[2019 rows x 10 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second.\n", "\n", "This is OK except for one pair of consecutive periods between which\n", "a whole week is missing.\n", "\n", "We recognize the dates: it's the week without observations that we\n", "have deleted earlier!" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1989-05-01/1989-05-07 1989-05-15/1989-05-21\n" ] } ], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "Empty 'DataFrame': no numeric data to plot", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msorted_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'inc'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)\u001b[0m\n\u001b[1;32m 2501\u001b[0m \u001b[0mcolormap\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcolormap\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0myerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0myerr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2502\u001b[0m \u001b[0mxerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxerr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msecondary_y\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msecondary_y\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2503\u001b[0;31m **kwds)\n\u001b[0m\u001b[1;32m 2504\u001b[0m \u001b[0m__call__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mplot_series\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2505\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36mplot_series\u001b[0;34m(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)\u001b[0m\n\u001b[1;32m 1925\u001b[0m \u001b[0myerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0myerr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxerr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1926\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msecondary_y\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msecondary_y\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1927\u001b[0;31m **kwds)\n\u001b[0m\u001b[1;32m 1928\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1929\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_plot\u001b[0;34m(data, x, y, subplots, ax, kind, **kwds)\u001b[0m\n\u001b[1;32m 1727\u001b[0m \u001b[0mplot_obj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubplots\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msubplots\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkind\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1728\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1729\u001b[0;31m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgenerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1730\u001b[0m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdraw\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1731\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36mgenerate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 248\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 249\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args_adjust\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 250\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_compute_plot_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 251\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_setup_subplots\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 252\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_plot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_compute_plot_data\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 363\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_empty\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 364\u001b[0m raise TypeError('Empty {0!r}: no numeric data to '\n\u001b[0;32m--> 365\u001b[0;31m 'plot'.format(numeric_data.__class__.__name__))\n\u001b[0m\u001b[1;32m 366\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 367\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumeric_data\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: Empty 'DataFrame': no numeric data to plot" ] } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A zoom on the last few years shows more clearly that the peaks are situated in winter." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "Empty 'DataFrame': no numeric data to plot", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msorted_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'inc'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m200\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)\u001b[0m\n\u001b[1;32m 2501\u001b[0m \u001b[0mcolormap\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcolormap\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0myerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0myerr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2502\u001b[0m \u001b[0mxerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxerr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msecondary_y\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msecondary_y\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2503\u001b[0;31m **kwds)\n\u001b[0m\u001b[1;32m 2504\u001b[0m \u001b[0m__call__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mplot_series\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2505\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36mplot_series\u001b[0;34m(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)\u001b[0m\n\u001b[1;32m 1925\u001b[0m \u001b[0myerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0myerr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxerr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1926\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msecondary_y\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msecondary_y\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1927\u001b[0;31m **kwds)\n\u001b[0m\u001b[1;32m 1928\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1929\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_plot\u001b[0;34m(data, x, y, subplots, ax, kind, **kwds)\u001b[0m\n\u001b[1;32m 1727\u001b[0m \u001b[0mplot_obj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubplots\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msubplots\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkind\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1728\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1729\u001b[0;31m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgenerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1730\u001b[0m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdraw\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1731\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36mgenerate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 248\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 249\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args_adjust\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 250\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_compute_plot_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 251\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_setup_subplots\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 252\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_plot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_compute_plot_data\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 363\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_empty\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 364\u001b[0m raise TypeError('Empty {0!r}: no numeric data to '\n\u001b[0;32m--> 365\u001b[0;31m 'plot'.format(numeric_data.__class__.__name__))\n\u001b[0m\u001b[1;32m 366\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 367\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumeric_data\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: Empty 'DataFrame': no numeric data to plot" ] } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in winter, near the transition\n", "between calendar years, we define the reference period for the annual\n", "incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n", "label this period as year $N+1$ because the peak is always located in\n", "year $N+1$. The very low incidence in summer ensures that the arbitrariness\n", "of the choice of reference period has no impact on our conclusions.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an\n", "integer number of weeks. Therefore we modify our reference period a bit:\n", "instead of August 1st, we use the first day of the week containing August 1st.\n", "\n", "A final detail: the dataset starts in October 1984, the first peak is thus\n", "incomplete, We start the analysis with the first full peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n", "but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }