{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of influenza-like illness in France" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download.\n", "\n", "we downlaod the file to insure that the data exit " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"\n", "\n", "data_file = \"syndrome-grippal.csv\"\n", "\n", "import os\n", "import urllib.request\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
020223632300117464.028538.03527.043.0FRFrance
12022353132479926.016568.02015.025.0FRFrance
22022343107087334.014082.01611.021.0FRFrance
32022333129268869.016983.01913.025.0FRFrance
420223232225716158.028356.03425.043.0FRFrance
520223132182816268.027388.03325.041.0FRFrance
620223031966314779.024547.03023.037.0FRFrance
720222932426818906.029630.03729.045.0FRFrance
820222832484519214.030476.03729.045.0FRFrance
920222734074533994.047496.06151.071.0FRFrance
1020222633401028521.039499.05143.059.0FRFrance
1120222532337719042.027712.03528.042.0FRFrance
1220222432632821829.030827.04033.047.0FRFrance
1320222332343018950.027910.03528.042.0FRFrance
1420222231895115099.022803.02923.035.0FRFrance
1520222131363210251.017013.02116.026.0FRFrance
1620222031978715756.023818.03024.036.0FRFrance
1720221931788414079.021689.02721.033.0FRFrance
1820221833035325089.035617.04638.054.0FRFrance
1920221733600630373.041639.05446.062.0FRFrance
2020221634994942836.057062.07564.086.0FRFrance
21202215310080690824.0110788.0152137.0167.0FRFrance
222022143155441143891.0166991.0234217.0251.0FRFrance
232022133191914179558.0204270.0289270.0308.0FRFrance
242022123166224155035.0177413.0251234.0268.0FRFrance
252022113122849113306.0132392.0185171.0199.0FRFrance
2620221038790479741.096067.0133121.0145.0FRFrance
2720220935018243958.056406.07667.085.0FRFrance
2820220833096325942.035984.04739.055.0FRFrance
2920220733488229446.040318.05345.061.0FRFrance
.................................
194619852132609619621.032571.04735.059.0FRFrance
194719852032789620885.034907.05138.064.0FRFrance
194819851934315432821.053487.07859.097.0FRFrance
194919851834055529935.051175.07455.093.0FRFrance
195019851733405324366.043740.06244.080.0FRFrance
195119851635036236451.064273.09166.0116.0FRFrance
195219851536388145538.082224.011683.0149.0FRFrance
19531985143134545114400.0154690.0244207.0281.0FRFrance
19541985133197206176080.0218332.0357319.0395.0FRFrance
19551985123245240223304.0267176.0445405.0485.0FRFrance
19561985113276205252399.0300011.0501458.0544.0FRFrance
19571985103353231326279.0380183.0640591.0689.0FRFrance
19581985093369895341109.0398681.0670618.0722.0FRFrance
19591985083389886359529.0420243.0707652.0762.0FRFrance
19601985073471852432599.0511105.0855784.0926.0FRFrance
19611985063565825518011.0613639.01026939.01113.0FRFrance
19621985053637302592795.0681809.011551074.01236.0FRFrance
19631985043424937390794.0459080.0770708.0832.0FRFrance
19641985033213901174689.0253113.0388317.0459.0FRFrance
196519850239758680949.0114223.0177147.0207.0FRFrance
196619850138548965918.0105060.0155120.0190.0FRFrance
196719845238483060602.0109058.0154110.0198.0FRFrance
1968198451310172680242.0123210.0185146.0224.0FRFrance
19691984503123680101401.0145959.0225184.0266.0FRFrance
1970198449310107381684.0120462.0184149.0219.0FRFrance
197119844837862060634.096606.0143110.0176.0FRFrance
197219844737202954274.089784.013199.0163.0FRFrance
197319844638733067686.0106974.0159123.0195.0FRFrance
19741984453135223101414.0169032.0246184.0308.0FRFrance
197519844436842220056.0116788.012537.0213.0FRFrance
\n", "

1976 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202236 3 23001 17464.0 28538.0 35 27.0 \n", "1 202235 3 13247 9926.0 16568.0 20 15.0 \n", "2 202234 3 10708 7334.0 14082.0 16 11.0 \n", "3 202233 3 12926 8869.0 16983.0 19 13.0 \n", "4 202232 3 22257 16158.0 28356.0 34 25.0 \n", "5 202231 3 21828 16268.0 27388.0 33 25.0 \n", "6 202230 3 19663 14779.0 24547.0 30 23.0 \n", "7 202229 3 24268 18906.0 29630.0 37 29.0 \n", "8 202228 3 24845 19214.0 30476.0 37 29.0 \n", "9 202227 3 40745 33994.0 47496.0 61 51.0 \n", "10 202226 3 34010 28521.0 39499.0 51 43.0 \n", "11 202225 3 23377 19042.0 27712.0 35 28.0 \n", "12 202224 3 26328 21829.0 30827.0 40 33.0 \n", "13 202223 3 23430 18950.0 27910.0 35 28.0 \n", "14 202222 3 18951 15099.0 22803.0 29 23.0 \n", "15 202221 3 13632 10251.0 17013.0 21 16.0 \n", "16 202220 3 19787 15756.0 23818.0 30 24.0 \n", "17 202219 3 17884 14079.0 21689.0 27 21.0 \n", "18 202218 3 30353 25089.0 35617.0 46 38.0 \n", "19 202217 3 36006 30373.0 41639.0 54 46.0 \n", "20 202216 3 49949 42836.0 57062.0 75 64.0 \n", "21 202215 3 100806 90824.0 110788.0 152 137.0 \n", "22 202214 3 155441 143891.0 166991.0 234 217.0 \n", "23 202213 3 191914 179558.0 204270.0 289 270.0 \n", "24 202212 3 166224 155035.0 177413.0 251 234.0 \n", "25 202211 3 122849 113306.0 132392.0 185 171.0 \n", "26 202210 3 87904 79741.0 96067.0 133 121.0 \n", "27 202209 3 50182 43958.0 56406.0 76 67.0 \n", "28 202208 3 30963 25942.0 35984.0 47 39.0 \n", "29 202207 3 34882 29446.0 40318.0 53 45.0 \n", "... ... ... ... ... ... ... ... \n", "1946 198521 3 26096 19621.0 32571.0 47 35.0 \n", "1947 198520 3 27896 20885.0 34907.0 51 38.0 \n", "1948 198519 3 43154 32821.0 53487.0 78 59.0 \n", "1949 198518 3 40555 29935.0 51175.0 74 55.0 \n", "1950 198517 3 34053 24366.0 43740.0 62 44.0 \n", "1951 198516 3 50362 36451.0 64273.0 91 66.0 \n", "1952 198515 3 63881 45538.0 82224.0 116 83.0 \n", "1953 198514 3 134545 114400.0 154690.0 244 207.0 \n", "1954 198513 3 197206 176080.0 218332.0 357 319.0 \n", "1955 198512 3 245240 223304.0 267176.0 445 405.0 \n", "1956 198511 3 276205 252399.0 300011.0 501 458.0 \n", "1957 198510 3 353231 326279.0 380183.0 640 591.0 \n", "1958 198509 3 369895 341109.0 398681.0 670 618.0 \n", "1959 198508 3 389886 359529.0 420243.0 707 652.0 \n", "1960 198507 3 471852 432599.0 511105.0 855 784.0 \n", "1961 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "1962 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "1963 198504 3 424937 390794.0 459080.0 770 708.0 \n", "1964 198503 3 213901 174689.0 253113.0 388 317.0 \n", "1965 198502 3 97586 80949.0 114223.0 177 147.0 \n", "1966 198501 3 85489 65918.0 105060.0 155 120.0 \n", "1967 198452 3 84830 60602.0 109058.0 154 110.0 \n", "1968 198451 3 101726 80242.0 123210.0 185 146.0 \n", "1969 198450 3 123680 101401.0 145959.0 225 184.0 \n", "1970 198449 3 101073 81684.0 120462.0 184 149.0 \n", "1971 198448 3 78620 60634.0 96606.0 143 110.0 \n", "1972 198447 3 72029 54274.0 89784.0 131 99.0 \n", "1973 198446 3 87330 67686.0 106974.0 159 123.0 \n", "1974 198445 3 135223 101414.0 169032.0 246 184.0 \n", "1975 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 43.0 FR France \n", "1 25.0 FR France \n", "2 21.0 FR France \n", "3 25.0 FR France \n", "4 43.0 FR France \n", "5 41.0 FR France \n", "6 37.0 FR France \n", "7 45.0 FR France \n", "8 45.0 FR France \n", "9 71.0 FR France \n", "10 59.0 FR France \n", "11 42.0 FR France \n", "12 47.0 FR France \n", "13 42.0 FR France \n", "14 35.0 FR France \n", "15 26.0 FR France \n", "16 36.0 FR France \n", "17 33.0 FR France \n", "18 54.0 FR France \n", "19 62.0 FR France \n", "20 86.0 FR France \n", "21 167.0 FR France \n", "22 251.0 FR France \n", "23 308.0 FR France \n", "24 268.0 FR France \n", "25 199.0 FR France \n", "26 145.0 FR France \n", "27 85.0 FR France \n", "28 55.0 FR France \n", "29 61.0 FR France \n", "... ... ... ... \n", "1946 59.0 FR France \n", "1947 64.0 FR France \n", "1948 97.0 FR France \n", "1949 93.0 FR France \n", "1950 80.0 FR France \n", "1951 116.0 FR France \n", "1952 149.0 FR France \n", "1953 281.0 FR France \n", "1954 395.0 FR France \n", "1955 485.0 FR France \n", "1956 544.0 FR France \n", "1957 689.0 FR France \n", "1958 722.0 FR France \n", "1959 762.0 FR France \n", "1960 926.0 FR France \n", "1961 1113.0 FR France \n", "1962 1236.0 FR France \n", "1963 832.0 FR France \n", "1964 459.0 FR France \n", "1965 207.0 FR France \n", "1966 190.0 FR France \n", "1967 198.0 FR France \n", "1968 224.0 FR France \n", "1969 266.0 FR France \n", "1970 219.0 FR France \n", "1971 176.0 FR France \n", "1972 163.0 FR France \n", "1973 195.0 FR France \n", "1974 308.0 FR France \n", "1975 213.0 FR France \n", "\n", "[1976 rows x 10 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? Yes, week 19 of year 1989 does not have any observed values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We delete this point, which does not have big consequence for our rather simple analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second.\n", "\n", "This is OK except for one pair of consecutive periods between which\n", "a whole week is missing.\n", "\n", "We recognize the dates: it's the week without observations that we\n", "have deleted earlier!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A zoom on the last few years shows more clearly that the peaks are situated in winter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in winter, near the transition\n", "between calendar years, we define the reference period for the annual\n", "incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n", "label this period as year $N+1$ because the peak is always located in\n", "year $N+1$. The very low incidence in summer ensures that the arbitrariness\n", "of the choice of reference period has no impact on our conclusions.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an\n", "integer number of weeks. Therefore we modify our reference period a bit:\n", "instead of August 1st, we use the first day of the week containing August 1st.\n", "\n", "A final detail: the dataset starts in October 1984, the first peak is thus\n", "incomplete, We start the analysis with the first full peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n", "but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }