{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of influenza-like illness in France" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download.\n", "\n", "In order to protect us in case the Réseau Sentinelles Web server disappears or is modified, we make a local copy of this dataset that we store together with our analysis. It is unnecessary and even risky to download the data at each execution, because in case of a malfunction we might be replacing our file by a corrupted version. Therefore we download the data only if no local copy exists." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"\n", "data_file = \"incidence-PAY-3.csv\"\n", "\n", "import os\n", "import urllib.request\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
020253632616420247.032081.03930.048.0FRFrance
120253532310117698.028504.03426.042.0FRFrance
220253432142916177.026681.03224.040.0FRFrance
320253331676612022.021510.02518.032.0FRFrance
420253231990014303.025497.03022.038.0FRFrance
520253131847012625.024315.02819.037.0FRFrance
620253031916614283.024049.02922.036.0FRFrance
720252931867313815.023531.02821.035.0FRFrance
820252832328518131.028439.03527.043.0FRFrance
920252732145317129.025777.03226.038.0FRFrance
1020252632194517422.026468.03326.040.0FRFrance
1120252532332318546.028100.03528.042.0FRFrance
1220252432315418577.027731.03528.042.0FRFrance
1320252332439119307.029475.03628.044.0FRFrance
1420252231875514333.023177.02821.035.0FRFrance
1520252132376018671.028849.03527.043.0FRFrance
1620252032026515814.024716.03023.037.0FRFrance
1720251931626412394.020134.02418.030.0FRFrance
1820251831811513975.022255.02721.033.0FRFrance
1920251732215017291.027009.03326.040.0FRFrance
2020251632856422550.034578.04334.052.0FRFrance
2120251533572129592.041850.05344.062.0FRFrance
2220251433757931232.043926.05647.065.0FRFrance
2320251333967333686.045660.05950.068.0FRFrance
2420251235254345627.059459.07868.088.0FRFrance
2520251135946952154.066784.08978.0100.0FRFrance
2620251036033453048.067620.09079.0101.0FRFrance
2720250938453174994.094068.0126112.0140.0FRFrance
282025083136020124824.0147216.0203186.0220.0FRFrance
292025073208952195988.0221916.0312293.0331.0FRFrance
.................................
210219852132609619621.032571.04735.059.0FRFrance
210319852032789620885.034907.05138.064.0FRFrance
210419851934315432821.053487.07859.097.0FRFrance
210519851834055529935.051175.07455.093.0FRFrance
210619851733405324366.043740.06244.080.0FRFrance
210719851635036236451.064273.09166.0116.0FRFrance
210819851536388145538.082224.011683.0149.0FRFrance
21091985143134545114400.0154690.0244207.0281.0FRFrance
21101985133197206176080.0218332.0357319.0395.0FRFrance
21111985123245240223304.0267176.0445405.0485.0FRFrance
21121985113276205252399.0300011.0501458.0544.0FRFrance
21131985103353231326279.0380183.0640591.0689.0FRFrance
21141985093369895341109.0398681.0670618.0722.0FRFrance
21151985083389886359529.0420243.0707652.0762.0FRFrance
21161985073471852432599.0511105.0855784.0926.0FRFrance
21171985063565825518011.0613639.01026939.01113.0FRFrance
21181985053637302592795.0681809.011551074.01236.0FRFrance
21191985043424937390794.0459080.0770708.0832.0FRFrance
21201985033213901174689.0253113.0388317.0459.0FRFrance
212119850239758680949.0114223.0177147.0207.0FRFrance
212219850138548965918.0105060.0155120.0190.0FRFrance
212319845238483060602.0109058.0154110.0198.0FRFrance
2124198451310172680242.0123210.0185146.0224.0FRFrance
21251984503123680101401.0145959.0225184.0266.0FRFrance
2126198449310107381684.0120462.0184149.0219.0FRFrance
212719844837862060634.096606.0143110.0176.0FRFrance
212819844737202954274.089784.013199.0163.0FRFrance
212919844638733067686.0106974.0159123.0195.0FRFrance
21301984453135223101414.0169032.0246184.0308.0FRFrance
213119844436842220056.0116788.012537.0213.0FRFrance
\n", "

2132 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202536 3 26164 20247.0 32081.0 39 30.0 \n", "1 202535 3 23101 17698.0 28504.0 34 26.0 \n", "2 202534 3 21429 16177.0 26681.0 32 24.0 \n", "3 202533 3 16766 12022.0 21510.0 25 18.0 \n", "4 202532 3 19900 14303.0 25497.0 30 22.0 \n", "5 202531 3 18470 12625.0 24315.0 28 19.0 \n", "6 202530 3 19166 14283.0 24049.0 29 22.0 \n", "7 202529 3 18673 13815.0 23531.0 28 21.0 \n", "8 202528 3 23285 18131.0 28439.0 35 27.0 \n", "9 202527 3 21453 17129.0 25777.0 32 26.0 \n", "10 202526 3 21945 17422.0 26468.0 33 26.0 \n", "11 202525 3 23323 18546.0 28100.0 35 28.0 \n", "12 202524 3 23154 18577.0 27731.0 35 28.0 \n", "13 202523 3 24391 19307.0 29475.0 36 28.0 \n", "14 202522 3 18755 14333.0 23177.0 28 21.0 \n", "15 202521 3 23760 18671.0 28849.0 35 27.0 \n", "16 202520 3 20265 15814.0 24716.0 30 23.0 \n", "17 202519 3 16264 12394.0 20134.0 24 18.0 \n", "18 202518 3 18115 13975.0 22255.0 27 21.0 \n", "19 202517 3 22150 17291.0 27009.0 33 26.0 \n", "20 202516 3 28564 22550.0 34578.0 43 34.0 \n", "21 202515 3 35721 29592.0 41850.0 53 44.0 \n", "22 202514 3 37579 31232.0 43926.0 56 47.0 \n", "23 202513 3 39673 33686.0 45660.0 59 50.0 \n", "24 202512 3 52543 45627.0 59459.0 78 68.0 \n", "25 202511 3 59469 52154.0 66784.0 89 78.0 \n", "26 202510 3 60334 53048.0 67620.0 90 79.0 \n", "27 202509 3 84531 74994.0 94068.0 126 112.0 \n", "28 202508 3 136020 124824.0 147216.0 203 186.0 \n", "29 202507 3 208952 195988.0 221916.0 312 293.0 \n", "... ... ... ... ... ... ... ... \n", "2102 198521 3 26096 19621.0 32571.0 47 35.0 \n", "2103 198520 3 27896 20885.0 34907.0 51 38.0 \n", "2104 198519 3 43154 32821.0 53487.0 78 59.0 \n", "2105 198518 3 40555 29935.0 51175.0 74 55.0 \n", "2106 198517 3 34053 24366.0 43740.0 62 44.0 \n", "2107 198516 3 50362 36451.0 64273.0 91 66.0 \n", "2108 198515 3 63881 45538.0 82224.0 116 83.0 \n", "2109 198514 3 134545 114400.0 154690.0 244 207.0 \n", "2110 198513 3 197206 176080.0 218332.0 357 319.0 \n", "2111 198512 3 245240 223304.0 267176.0 445 405.0 \n", "2112 198511 3 276205 252399.0 300011.0 501 458.0 \n", "2113 198510 3 353231 326279.0 380183.0 640 591.0 \n", "2114 198509 3 369895 341109.0 398681.0 670 618.0 \n", "2115 198508 3 389886 359529.0 420243.0 707 652.0 \n", "2116 198507 3 471852 432599.0 511105.0 855 784.0 \n", "2117 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "2118 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "2119 198504 3 424937 390794.0 459080.0 770 708.0 \n", "2120 198503 3 213901 174689.0 253113.0 388 317.0 \n", "2121 198502 3 97586 80949.0 114223.0 177 147.0 \n", "2122 198501 3 85489 65918.0 105060.0 155 120.0 \n", "2123 198452 3 84830 60602.0 109058.0 154 110.0 \n", "2124 198451 3 101726 80242.0 123210.0 185 146.0 \n", "2125 198450 3 123680 101401.0 145959.0 225 184.0 \n", "2126 198449 3 101073 81684.0 120462.0 184 149.0 \n", "2127 198448 3 78620 60634.0 96606.0 143 110.0 \n", "2128 198447 3 72029 54274.0 89784.0 131 99.0 \n", "2129 198446 3 87330 67686.0 106974.0 159 123.0 \n", "2130 198445 3 135223 101414.0 169032.0 246 184.0 \n", "2131 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 48.0 FR France \n", "1 42.0 FR France \n", "2 40.0 FR France \n", "3 32.0 FR France \n", "4 38.0 FR France \n", "5 37.0 FR France \n", "6 36.0 FR France \n", "7 35.0 FR France \n", "8 43.0 FR France \n", "9 38.0 FR France \n", "10 40.0 FR France \n", "11 42.0 FR France \n", "12 42.0 FR France \n", "13 44.0 FR France \n", "14 35.0 FR France \n", "15 43.0 FR France \n", "16 37.0 FR France \n", "17 30.0 FR France \n", "18 33.0 FR France \n", "19 40.0 FR France \n", "20 52.0 FR France \n", "21 62.0 FR France \n", "22 65.0 FR France \n", "23 68.0 FR France \n", "24 88.0 FR France \n", "25 100.0 FR France \n", "26 101.0 FR France \n", "27 140.0 FR France \n", "28 220.0 FR France \n", "29 331.0 FR France \n", "... ... ... ... \n", "2102 59.0 FR France \n", "2103 64.0 FR France \n", "2104 97.0 FR France \n", "2105 93.0 FR France \n", "2106 80.0 FR France \n", "2107 116.0 FR France \n", "2108 149.0 FR France \n", "2109 281.0 FR France \n", "2110 395.0 FR France \n", "2111 485.0 FR France \n", "2112 544.0 FR France \n", "2113 689.0 FR France \n", "2114 722.0 FR France \n", "2115 762.0 FR France \n", "2116 926.0 FR France \n", "2117 1113.0 FR France \n", "2118 1236.0 FR France \n", "2119 832.0 FR France \n", "2120 459.0 FR France \n", "2121 207.0 FR France \n", "2122 190.0 FR France \n", "2123 198.0 FR France \n", "2124 224.0 FR France \n", "2125 266.0 FR France \n", "2126 219.0 FR France \n", "2127 176.0 FR France \n", "2128 163.0 FR France \n", "2129 195.0 FR France \n", "2130 308.0 FR France \n", "2131 213.0 FR France \n", "\n", "[2132 rows x 10 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? Yes, week 19 of year 1989 does not have any observed values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We delete this point, which does not have big consequence for our rather simple analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second.\n", "\n", "This is OK except for one pair of consecutive periods between which\n", "a whole week is missing.\n", "\n", "We recognize the dates: it's the week without observations that we\n", "have deleted earlier!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A zoom on the last few years shows more clearly that the peaks are situated in winter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in winter, near the transition\n", "between calendar years, we define the reference period for the annual\n", "incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n", "label this period as year $N+1$ because the peak is always located in\n", "year $N+1$. The very low incidence in summer ensures that the arbitrariness\n", "of the choice of reference period has no impact on our conclusions.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an\n", "integer number of weeks. Therefore we modify our reference period a bit:\n", "instead of August 1st, we use the first day of the week containing August 1st.\n", "\n", "A final detail: the dataset starts in October 1984, the first peak is thus\n", "incomplete, We start the analysis with the first full peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n", "but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }