{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of influenza-like illness in France" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](https://www.sentiweb.fr/france/en/?). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download. \n", "In case the online link is not working, the data in local repo is used." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"\n", "\n", "data_file = \"inc-3-PAY.csv\"\n", "\n", "import os\n", "import urllib.request\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
020242535034742176.058518.07563.087.0FRFrance
120242434141434928.047900.06252.072.0FRFrance
220242333587530610.041140.05446.062.0FRFrance
320242233377228274.039270.05143.059.0FRFrance
420242132196317556.026370.03326.040.0FRFrance
520242032005715780.024334.03024.036.0FRFrance
620241931537511274.019476.02317.029.0FRFrance
720241832240917653.027165.03427.041.0FRFrance
820241732704221410.032674.04133.049.0FRFrance
920241632888223305.034459.04335.051.0FRFrance
1020241533022924648.035810.04537.053.0FRFrance
1120241433181326529.037097.04840.056.0FRFrance
1220241333509029607.040573.05345.061.0FRFrance
1320241234063934582.046696.06152.070.0FRFrance
1420241135026843331.057205.07565.085.0FRFrance
1520241036010752623.067591.09079.0101.0FRFrance
1620240937112162920.079322.010795.0119.0FRFrance
17202408310456694520.0114612.0157142.0172.0FRFrance
182024073138078127050.0149106.0207190.0224.0FRFrance
192024063190062177955.0202169.0285267.0303.0FRFrance
202024053216237203595.0228879.0324305.0343.0FRFrance
212024043213196200547.0225845.0320301.0339.0FRFrance
222024033163457152276.0174638.0245228.0262.0FRFrance
232024023129436119453.0139419.0194179.0209.0FRFrance
242024013120769109452.0132086.0181164.0198.0FRFrance
252023523115446103738.0127154.0174156.0192.0FRFrance
262023513148755136546.0160964.0224206.0242.0FRFrance
272023503147971136787.0159155.0223206.0240.0FRFrance
282023493147552136422.0158682.0222205.0239.0FRFrance
292023483124204113479.0134929.0187171.0203.0FRFrance
.................................
203919852132609619621.032571.04735.059.0FRFrance
204019852032789620885.034907.05138.064.0FRFrance
204119851934315432821.053487.07859.097.0FRFrance
204219851834055529935.051175.07455.093.0FRFrance
204319851733405324366.043740.06244.080.0FRFrance
204419851635036236451.064273.09166.0116.0FRFrance
204519851536388145538.082224.011683.0149.0FRFrance
20461985143134545114400.0154690.0244207.0281.0FRFrance
20471985133197206176080.0218332.0357319.0395.0FRFrance
20481985123245240223304.0267176.0445405.0485.0FRFrance
20491985113276205252399.0300011.0501458.0544.0FRFrance
20501985103353231326279.0380183.0640591.0689.0FRFrance
20511985093369895341109.0398681.0670618.0722.0FRFrance
20521985083389886359529.0420243.0707652.0762.0FRFrance
20531985073471852432599.0511105.0855784.0926.0FRFrance
20541985063565825518011.0613639.01026939.01113.0FRFrance
20551985053637302592795.0681809.011551074.01236.0FRFrance
20561985043424937390794.0459080.0770708.0832.0FRFrance
20571985033213901174689.0253113.0388317.0459.0FRFrance
205819850239758680949.0114223.0177147.0207.0FRFrance
205919850138548965918.0105060.0155120.0190.0FRFrance
206019845238483060602.0109058.0154110.0198.0FRFrance
2061198451310172680242.0123210.0185146.0224.0FRFrance
20621984503123680101401.0145959.0225184.0266.0FRFrance
2063198449310107381684.0120462.0184149.0219.0FRFrance
206419844837862060634.096606.0143110.0176.0FRFrance
206519844737202954274.089784.013199.0163.0FRFrance
206619844638733067686.0106974.0159123.0195.0FRFrance
20671984453135223101414.0169032.0246184.0308.0FRFrance
206819844436842220056.0116788.012537.0213.0FRFrance
\n", "

2069 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202425 3 50347 42176.0 58518.0 75 63.0 \n", "1 202424 3 41414 34928.0 47900.0 62 52.0 \n", "2 202423 3 35875 30610.0 41140.0 54 46.0 \n", "3 202422 3 33772 28274.0 39270.0 51 43.0 \n", "4 202421 3 21963 17556.0 26370.0 33 26.0 \n", "5 202420 3 20057 15780.0 24334.0 30 24.0 \n", "6 202419 3 15375 11274.0 19476.0 23 17.0 \n", "7 202418 3 22409 17653.0 27165.0 34 27.0 \n", "8 202417 3 27042 21410.0 32674.0 41 33.0 \n", "9 202416 3 28882 23305.0 34459.0 43 35.0 \n", "10 202415 3 30229 24648.0 35810.0 45 37.0 \n", "11 202414 3 31813 26529.0 37097.0 48 40.0 \n", "12 202413 3 35090 29607.0 40573.0 53 45.0 \n", "13 202412 3 40639 34582.0 46696.0 61 52.0 \n", "14 202411 3 50268 43331.0 57205.0 75 65.0 \n", "15 202410 3 60107 52623.0 67591.0 90 79.0 \n", "16 202409 3 71121 62920.0 79322.0 107 95.0 \n", "17 202408 3 104566 94520.0 114612.0 157 142.0 \n", "18 202407 3 138078 127050.0 149106.0 207 190.0 \n", "19 202406 3 190062 177955.0 202169.0 285 267.0 \n", "20 202405 3 216237 203595.0 228879.0 324 305.0 \n", "21 202404 3 213196 200547.0 225845.0 320 301.0 \n", "22 202403 3 163457 152276.0 174638.0 245 228.0 \n", "23 202402 3 129436 119453.0 139419.0 194 179.0 \n", "24 202401 3 120769 109452.0 132086.0 181 164.0 \n", "25 202352 3 115446 103738.0 127154.0 174 156.0 \n", "26 202351 3 148755 136546.0 160964.0 224 206.0 \n", "27 202350 3 147971 136787.0 159155.0 223 206.0 \n", "28 202349 3 147552 136422.0 158682.0 222 205.0 \n", "29 202348 3 124204 113479.0 134929.0 187 171.0 \n", "... ... ... ... ... ... ... ... \n", "2039 198521 3 26096 19621.0 32571.0 47 35.0 \n", "2040 198520 3 27896 20885.0 34907.0 51 38.0 \n", "2041 198519 3 43154 32821.0 53487.0 78 59.0 \n", "2042 198518 3 40555 29935.0 51175.0 74 55.0 \n", "2043 198517 3 34053 24366.0 43740.0 62 44.0 \n", "2044 198516 3 50362 36451.0 64273.0 91 66.0 \n", "2045 198515 3 63881 45538.0 82224.0 116 83.0 \n", "2046 198514 3 134545 114400.0 154690.0 244 207.0 \n", "2047 198513 3 197206 176080.0 218332.0 357 319.0 \n", "2048 198512 3 245240 223304.0 267176.0 445 405.0 \n", "2049 198511 3 276205 252399.0 300011.0 501 458.0 \n", "2050 198510 3 353231 326279.0 380183.0 640 591.0 \n", "2051 198509 3 369895 341109.0 398681.0 670 618.0 \n", "2052 198508 3 389886 359529.0 420243.0 707 652.0 \n", "2053 198507 3 471852 432599.0 511105.0 855 784.0 \n", "2054 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "2055 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "2056 198504 3 424937 390794.0 459080.0 770 708.0 \n", "2057 198503 3 213901 174689.0 253113.0 388 317.0 \n", "2058 198502 3 97586 80949.0 114223.0 177 147.0 \n", "2059 198501 3 85489 65918.0 105060.0 155 120.0 \n", "2060 198452 3 84830 60602.0 109058.0 154 110.0 \n", "2061 198451 3 101726 80242.0 123210.0 185 146.0 \n", "2062 198450 3 123680 101401.0 145959.0 225 184.0 \n", "2063 198449 3 101073 81684.0 120462.0 184 149.0 \n", "2064 198448 3 78620 60634.0 96606.0 143 110.0 \n", "2065 198447 3 72029 54274.0 89784.0 131 99.0 \n", "2066 198446 3 87330 67686.0 106974.0 159 123.0 \n", "2067 198445 3 135223 101414.0 169032.0 246 184.0 \n", "2068 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 87.0 FR France \n", "1 72.0 FR France \n", "2 62.0 FR France \n", "3 59.0 FR France \n", "4 40.0 FR France \n", "5 36.0 FR France \n", "6 29.0 FR France \n", "7 41.0 FR France \n", "8 49.0 FR France \n", "9 51.0 FR France \n", "10 53.0 FR France \n", "11 56.0 FR France \n", "12 61.0 FR France \n", "13 70.0 FR France \n", "14 85.0 FR France \n", "15 101.0 FR France \n", "16 119.0 FR France \n", "17 172.0 FR France \n", "18 224.0 FR France \n", "19 303.0 FR France \n", "20 343.0 FR France \n", "21 339.0 FR France \n", "22 262.0 FR France \n", "23 209.0 FR France \n", "24 198.0 FR France \n", "25 192.0 FR France \n", "26 242.0 FR France \n", "27 240.0 FR France \n", "28 239.0 FR France \n", "29 203.0 FR France \n", "... ... ... ... \n", "2039 59.0 FR France \n", "2040 64.0 FR France \n", "2041 97.0 FR France \n", "2042 93.0 FR France \n", "2043 80.0 FR France \n", "2044 116.0 FR France \n", "2045 149.0 FR France \n", "2046 281.0 FR France \n", "2047 395.0 FR France \n", "2048 485.0 FR France \n", "2049 544.0 FR France \n", "2050 689.0 FR France \n", "2051 722.0 FR France \n", "2052 762.0 FR France \n", "2053 926.0 FR France \n", "2054 1113.0 FR France \n", "2055 1236.0 FR France \n", "2056 832.0 FR France \n", "2057 459.0 FR France \n", "2058 207.0 FR France \n", "2059 190.0 FR France \n", "2060 198.0 FR France \n", "2061 224.0 FR France \n", "2062 266.0 FR France \n", "2063 219.0 FR France \n", "2064 176.0 FR France \n", "2065 163.0 FR France \n", "2066 195.0 FR France \n", "2067 308.0 FR France \n", "2068 213.0 FR France \n", "\n", "[2069 rows x 10 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? Yes, week 19 of year 1989 does not have any observed values." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
18321989193-NaNNaN-NaNNaNFRFrance
\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low inc100_up \\\n", "1832 198919 3 - NaN NaN - NaN NaN \n", "\n", " geo_insee geo_name \n", "1832 FR France " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We delete this point, which does not have big consequence for our rather simple analysis." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
020242535034742176.058518.07563.087.0FRFrance
120242434141434928.047900.06252.072.0FRFrance
220242333587530610.041140.05446.062.0FRFrance
320242233377228274.039270.05143.059.0FRFrance
420242132196317556.026370.03326.040.0FRFrance
520242032005715780.024334.03024.036.0FRFrance
620241931537511274.019476.02317.029.0FRFrance
720241832240917653.027165.03427.041.0FRFrance
820241732704221410.032674.04133.049.0FRFrance
920241632888223305.034459.04335.051.0FRFrance
1020241533022924648.035810.04537.053.0FRFrance
1120241433181326529.037097.04840.056.0FRFrance
1220241333509029607.040573.05345.061.0FRFrance
1320241234063934582.046696.06152.070.0FRFrance
1420241135026843331.057205.07565.085.0FRFrance
1520241036010752623.067591.09079.0101.0FRFrance
1620240937112162920.079322.010795.0119.0FRFrance
17202408310456694520.0114612.0157142.0172.0FRFrance
182024073138078127050.0149106.0207190.0224.0FRFrance
192024063190062177955.0202169.0285267.0303.0FRFrance
202024053216237203595.0228879.0324305.0343.0FRFrance
212024043213196200547.0225845.0320301.0339.0FRFrance
222024033163457152276.0174638.0245228.0262.0FRFrance
232024023129436119453.0139419.0194179.0209.0FRFrance
242024013120769109452.0132086.0181164.0198.0FRFrance
252023523115446103738.0127154.0174156.0192.0FRFrance
262023513148755136546.0160964.0224206.0242.0FRFrance
272023503147971136787.0159155.0223206.0240.0FRFrance
282023493147552136422.0158682.0222205.0239.0FRFrance
292023483124204113479.0134929.0187171.0203.0FRFrance
.................................
203919852132609619621.032571.04735.059.0FRFrance
204019852032789620885.034907.05138.064.0FRFrance
204119851934315432821.053487.07859.097.0FRFrance
204219851834055529935.051175.07455.093.0FRFrance
204319851733405324366.043740.06244.080.0FRFrance
204419851635036236451.064273.09166.0116.0FRFrance
204519851536388145538.082224.011683.0149.0FRFrance
20461985143134545114400.0154690.0244207.0281.0FRFrance
20471985133197206176080.0218332.0357319.0395.0FRFrance
20481985123245240223304.0267176.0445405.0485.0FRFrance
20491985113276205252399.0300011.0501458.0544.0FRFrance
20501985103353231326279.0380183.0640591.0689.0FRFrance
20511985093369895341109.0398681.0670618.0722.0FRFrance
20521985083389886359529.0420243.0707652.0762.0FRFrance
20531985073471852432599.0511105.0855784.0926.0FRFrance
20541985063565825518011.0613639.01026939.01113.0FRFrance
20551985053637302592795.0681809.011551074.01236.0FRFrance
20561985043424937390794.0459080.0770708.0832.0FRFrance
20571985033213901174689.0253113.0388317.0459.0FRFrance
205819850239758680949.0114223.0177147.0207.0FRFrance
205919850138548965918.0105060.0155120.0190.0FRFrance
206019845238483060602.0109058.0154110.0198.0FRFrance
2061198451310172680242.0123210.0185146.0224.0FRFrance
20621984503123680101401.0145959.0225184.0266.0FRFrance
2063198449310107381684.0120462.0184149.0219.0FRFrance
206419844837862060634.096606.0143110.0176.0FRFrance
206519844737202954274.089784.013199.0163.0FRFrance
206619844638733067686.0106974.0159123.0195.0FRFrance
20671984453135223101414.0169032.0246184.0308.0FRFrance
206819844436842220056.0116788.012537.0213.0FRFrance
\n", "

2068 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202425 3 50347 42176.0 58518.0 75 63.0 \n", "1 202424 3 41414 34928.0 47900.0 62 52.0 \n", "2 202423 3 35875 30610.0 41140.0 54 46.0 \n", "3 202422 3 33772 28274.0 39270.0 51 43.0 \n", "4 202421 3 21963 17556.0 26370.0 33 26.0 \n", "5 202420 3 20057 15780.0 24334.0 30 24.0 \n", "6 202419 3 15375 11274.0 19476.0 23 17.0 \n", "7 202418 3 22409 17653.0 27165.0 34 27.0 \n", "8 202417 3 27042 21410.0 32674.0 41 33.0 \n", "9 202416 3 28882 23305.0 34459.0 43 35.0 \n", "10 202415 3 30229 24648.0 35810.0 45 37.0 \n", "11 202414 3 31813 26529.0 37097.0 48 40.0 \n", "12 202413 3 35090 29607.0 40573.0 53 45.0 \n", "13 202412 3 40639 34582.0 46696.0 61 52.0 \n", "14 202411 3 50268 43331.0 57205.0 75 65.0 \n", "15 202410 3 60107 52623.0 67591.0 90 79.0 \n", "16 202409 3 71121 62920.0 79322.0 107 95.0 \n", "17 202408 3 104566 94520.0 114612.0 157 142.0 \n", "18 202407 3 138078 127050.0 149106.0 207 190.0 \n", "19 202406 3 190062 177955.0 202169.0 285 267.0 \n", "20 202405 3 216237 203595.0 228879.0 324 305.0 \n", "21 202404 3 213196 200547.0 225845.0 320 301.0 \n", "22 202403 3 163457 152276.0 174638.0 245 228.0 \n", "23 202402 3 129436 119453.0 139419.0 194 179.0 \n", "24 202401 3 120769 109452.0 132086.0 181 164.0 \n", "25 202352 3 115446 103738.0 127154.0 174 156.0 \n", "26 202351 3 148755 136546.0 160964.0 224 206.0 \n", "27 202350 3 147971 136787.0 159155.0 223 206.0 \n", "28 202349 3 147552 136422.0 158682.0 222 205.0 \n", "29 202348 3 124204 113479.0 134929.0 187 171.0 \n", "... ... ... ... ... ... ... ... \n", "2039 198521 3 26096 19621.0 32571.0 47 35.0 \n", "2040 198520 3 27896 20885.0 34907.0 51 38.0 \n", "2041 198519 3 43154 32821.0 53487.0 78 59.0 \n", "2042 198518 3 40555 29935.0 51175.0 74 55.0 \n", "2043 198517 3 34053 24366.0 43740.0 62 44.0 \n", "2044 198516 3 50362 36451.0 64273.0 91 66.0 \n", "2045 198515 3 63881 45538.0 82224.0 116 83.0 \n", "2046 198514 3 134545 114400.0 154690.0 244 207.0 \n", "2047 198513 3 197206 176080.0 218332.0 357 319.0 \n", "2048 198512 3 245240 223304.0 267176.0 445 405.0 \n", "2049 198511 3 276205 252399.0 300011.0 501 458.0 \n", "2050 198510 3 353231 326279.0 380183.0 640 591.0 \n", "2051 198509 3 369895 341109.0 398681.0 670 618.0 \n", "2052 198508 3 389886 359529.0 420243.0 707 652.0 \n", "2053 198507 3 471852 432599.0 511105.0 855 784.0 \n", "2054 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "2055 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "2056 198504 3 424937 390794.0 459080.0 770 708.0 \n", "2057 198503 3 213901 174689.0 253113.0 388 317.0 \n", "2058 198502 3 97586 80949.0 114223.0 177 147.0 \n", "2059 198501 3 85489 65918.0 105060.0 155 120.0 \n", "2060 198452 3 84830 60602.0 109058.0 154 110.0 \n", "2061 198451 3 101726 80242.0 123210.0 185 146.0 \n", "2062 198450 3 123680 101401.0 145959.0 225 184.0 \n", "2063 198449 3 101073 81684.0 120462.0 184 149.0 \n", "2064 198448 3 78620 60634.0 96606.0 143 110.0 \n", "2065 198447 3 72029 54274.0 89784.0 131 99.0 \n", "2066 198446 3 87330 67686.0 106974.0 159 123.0 \n", "2067 198445 3 135223 101414.0 169032.0 246 184.0 \n", "2068 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 87.0 FR France \n", "1 72.0 FR France \n", "2 62.0 FR France \n", "3 59.0 FR France \n", "4 40.0 FR France \n", "5 36.0 FR France \n", "6 29.0 FR France \n", "7 41.0 FR France \n", "8 49.0 FR France \n", "9 51.0 FR France \n", "10 53.0 FR France \n", "11 56.0 FR France \n", "12 61.0 FR France \n", "13 70.0 FR France \n", "14 85.0 FR France \n", "15 101.0 FR France \n", "16 119.0 FR France \n", "17 172.0 FR France \n", "18 224.0 FR France \n", "19 303.0 FR France \n", "20 343.0 FR France \n", "21 339.0 FR France \n", "22 262.0 FR France \n", "23 209.0 FR France \n", "24 198.0 FR France \n", "25 192.0 FR France \n", "26 242.0 FR France \n", "27 240.0 FR France \n", "28 239.0 FR France \n", "29 203.0 FR France \n", "... ... ... ... \n", "2039 59.0 FR France \n", "2040 64.0 FR France \n", "2041 97.0 FR France \n", "2042 93.0 FR France \n", "2043 80.0 FR France \n", "2044 116.0 FR France \n", "2045 149.0 FR France \n", "2046 281.0 FR France \n", "2047 395.0 FR France \n", "2048 485.0 FR France \n", "2049 544.0 FR France \n", "2050 689.0 FR France \n", "2051 722.0 FR France \n", "2052 762.0 FR France \n", "2053 926.0 FR France \n", "2054 1113.0 FR France \n", "2055 1236.0 FR France \n", "2056 832.0 FR France \n", "2057 459.0 FR France \n", "2058 207.0 FR France \n", "2059 190.0 FR France \n", "2060 198.0 FR France \n", "2061 224.0 FR France \n", "2062 266.0 FR France \n", "2063 219.0 FR France \n", "2064 176.0 FR France \n", "2065 163.0 FR France \n", "2066 195.0 FR France \n", "2067 308.0 FR France \n", "2068 213.0 FR France \n", "\n", "[2068 rows x 10 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second.\n", "\n", "This is OK except for one pair of consecutive periods between which\n", "a whole week is missing.\n", "\n", "We recognize the dates: it's the week without observations that we\n", "have deleted earlier!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1989-05-01/1989-05-07 1989-05-15/1989-05-21\n" ] } ], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "Empty 'DataFrame': no numeric data to plot", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msorted_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'inc'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)\u001b[0m\n\u001b[1;32m 2501\u001b[0m \u001b[0mcolormap\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcolormap\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0myerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0myerr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2502\u001b[0m \u001b[0mxerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxerr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msecondary_y\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msecondary_y\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2503\u001b[0;31m **kwds)\n\u001b[0m\u001b[1;32m 2504\u001b[0m \u001b[0m__call__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mplot_series\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2505\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36mplot_series\u001b[0;34m(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)\u001b[0m\n\u001b[1;32m 1925\u001b[0m \u001b[0myerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0myerr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxerr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxerr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1926\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msecondary_y\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msecondary_y\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1927\u001b[0;31m **kwds)\n\u001b[0m\u001b[1;32m 1928\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1929\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_plot\u001b[0;34m(data, x, y, subplots, ax, kind, **kwds)\u001b[0m\n\u001b[1;32m 1727\u001b[0m \u001b[0mplot_obj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msubplots\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msubplots\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkind\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1728\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1729\u001b[0;31m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgenerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1730\u001b[0m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdraw\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1731\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mplot_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36mgenerate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 248\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 249\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args_adjust\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 250\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_compute_plot_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 251\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_setup_subplots\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 252\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_plot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_compute_plot_data\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 363\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_empty\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 364\u001b[0m raise TypeError('Empty {0!r}: no numeric data to '\n\u001b[0;32m--> 365\u001b[0;31m 'plot'.format(numeric_data.__class__.__name__))\n\u001b[0m\u001b[1;32m 366\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 367\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumeric_data\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: Empty 'DataFrame': no numeric data to plot" ] } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A zoom on the last few years shows more clearly that the peaks are situated in winter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in winter, near the transition\n", "between calendar years, we define the reference period for the annual\n", "incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n", "label this period as year $N+1$ because the peak is always located in\n", "year $N+1$. The very low incidence in summer ensures that the arbitrariness\n", "of the choice of reference period has no impact on our conclusions.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an\n", "integer number of weeks. Therefore we modify our reference period a bit:\n", "instead of August 1st, we use the first day of the week containing August 1st.\n", "\n", "A final detail: the dataset starts in October 1984, the first peak is thus\n", "incomplete, We start the analysis with the first full peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n", "but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }