{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of influenza-like illness in France" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of influenza-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1984 and ending with a recent week, is available for download." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "data_url = \"https://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I downloaded the CSV file from the sentilles webpage and added the URL to the notebook. Then I ran the command to save and show the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? Yes, week 19 of year 1989 does not have any observed values." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
175519891930NaNNaN0NaNNaNFRFrance
\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low inc100_up \\\n", "1755 198919 3 0 NaN NaN 0 NaN NaN \n", "\n", " geo_insee geo_name \n", "1755 FR France " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We delete this point, which does not have big consequence for our rather simple analysis." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
02022523156629143097.0170161.0236216.0256.0FRFrance
12022513248311232120.0264502.0374350.0398.0FRFrance
22022503234279219533.0249025.0353331.0375.0FRFrance
32022493163421151727.0175115.0246228.0264.0FRFrance
42022483121884111932.0131836.0184169.0199.0FRFrance
520224739644787259.0105635.0145131.0159.0FRFrance
620224636773560075.075395.010290.0114.0FRFrance
720224534530638909.051703.06858.078.0FRFrance
820224433471328880.040546.05243.061.0FRFrance
920224334476936884.052654.06856.080.0FRFrance
1020224234746240773.054151.07262.082.0FRFrance
1120224134858342388.054778.07364.082.0FRFrance
1220224034192736115.047739.06354.072.0FRFrance
1320223933990234168.045636.06051.069.0FRFrance
1420223832878123733.033829.04335.051.0FRFrance
1520223732139517076.025714.03225.039.0FRFrance
1620223631412010487.017753.02116.026.0FRFrance
17202235392836485.012081.01410.018.0FRFrance
18202234374984731.010265.0117.015.0FRFrance
19202233375864442.010730.0116.016.0FRFrance
202022323122227749.016695.01811.025.0FRFrance
212022313132578905.017609.02013.027.0FRFrance
2220223031500610738.019274.02317.029.0FRFrance
2320222932080115829.025773.03124.038.0FRFrance
2420222832338717970.028804.03527.043.0FRFrance
2520222733601529709.042321.05444.064.0FRFrance
2620222632942124314.034528.04436.052.0FRFrance
2720222532288718582.027192.03529.041.0FRFrance
2820222431929415406.023182.02923.035.0FRFrance
2920222331715913450.020868.02620.032.0FRFrance
.................................
196219852132609619621.032571.04735.059.0FRFrance
196319852032789620885.034907.05138.064.0FRFrance
196419851934315432821.053487.07859.097.0FRFrance
196519851834055529935.051175.07455.093.0FRFrance
196619851733405324366.043740.06244.080.0FRFrance
196719851635036236451.064273.09166.0116.0FRFrance
196819851536388145538.082224.011683.0149.0FRFrance
19691985143134545114400.0154690.0244207.0281.0FRFrance
19701985133197206176080.0218332.0357319.0395.0FRFrance
19711985123245240223304.0267176.0445405.0485.0FRFrance
19721985113276205252399.0300011.0501458.0544.0FRFrance
19731985103353231326279.0380183.0640591.0689.0FRFrance
19741985093369895341109.0398681.0670618.0722.0FRFrance
19751985083389886359529.0420243.0707652.0762.0FRFrance
19761985073471852432599.0511105.0855784.0926.0FRFrance
19771985063565825518011.0613639.01026939.01113.0FRFrance
19781985053637302592795.0681809.011551074.01236.0FRFrance
19791985043424937390794.0459080.0770708.0832.0FRFrance
19801985033213901174689.0253113.0388317.0459.0FRFrance
198119850239758680949.0114223.0177147.0207.0FRFrance
198219850138548965918.0105060.0155120.0190.0FRFrance
198319845238483060602.0109058.0154110.0198.0FRFrance
1984198451310172680242.0123210.0185146.0224.0FRFrance
19851984503123680101401.0145959.0225184.0266.0FRFrance
1986198449310107381684.0120462.0184149.0219.0FRFrance
198719844837862060634.096606.0143110.0176.0FRFrance
198819844737202954274.089784.013199.0163.0FRFrance
198919844638733067686.0106974.0159123.0195.0FRFrance
19901984453135223101414.0169032.0246184.0308.0FRFrance
199119844436842220056.0116788.012537.0213.0FRFrance
\n", "

1991 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202252 3 156629 143097.0 170161.0 236 216.0 \n", "1 202251 3 248311 232120.0 264502.0 374 350.0 \n", "2 202250 3 234279 219533.0 249025.0 353 331.0 \n", "3 202249 3 163421 151727.0 175115.0 246 228.0 \n", "4 202248 3 121884 111932.0 131836.0 184 169.0 \n", "5 202247 3 96447 87259.0 105635.0 145 131.0 \n", "6 202246 3 67735 60075.0 75395.0 102 90.0 \n", "7 202245 3 45306 38909.0 51703.0 68 58.0 \n", "8 202244 3 34713 28880.0 40546.0 52 43.0 \n", "9 202243 3 44769 36884.0 52654.0 68 56.0 \n", "10 202242 3 47462 40773.0 54151.0 72 62.0 \n", "11 202241 3 48583 42388.0 54778.0 73 64.0 \n", "12 202240 3 41927 36115.0 47739.0 63 54.0 \n", "13 202239 3 39902 34168.0 45636.0 60 51.0 \n", "14 202238 3 28781 23733.0 33829.0 43 35.0 \n", "15 202237 3 21395 17076.0 25714.0 32 25.0 \n", "16 202236 3 14120 10487.0 17753.0 21 16.0 \n", "17 202235 3 9283 6485.0 12081.0 14 10.0 \n", "18 202234 3 7498 4731.0 10265.0 11 7.0 \n", "19 202233 3 7586 4442.0 10730.0 11 6.0 \n", "20 202232 3 12222 7749.0 16695.0 18 11.0 \n", "21 202231 3 13257 8905.0 17609.0 20 13.0 \n", "22 202230 3 15006 10738.0 19274.0 23 17.0 \n", "23 202229 3 20801 15829.0 25773.0 31 24.0 \n", "24 202228 3 23387 17970.0 28804.0 35 27.0 \n", "25 202227 3 36015 29709.0 42321.0 54 44.0 \n", "26 202226 3 29421 24314.0 34528.0 44 36.0 \n", "27 202225 3 22887 18582.0 27192.0 35 29.0 \n", "28 202224 3 19294 15406.0 23182.0 29 23.0 \n", "29 202223 3 17159 13450.0 20868.0 26 20.0 \n", "... ... ... ... ... ... ... ... \n", "1962 198521 3 26096 19621.0 32571.0 47 35.0 \n", "1963 198520 3 27896 20885.0 34907.0 51 38.0 \n", "1964 198519 3 43154 32821.0 53487.0 78 59.0 \n", "1965 198518 3 40555 29935.0 51175.0 74 55.0 \n", "1966 198517 3 34053 24366.0 43740.0 62 44.0 \n", "1967 198516 3 50362 36451.0 64273.0 91 66.0 \n", "1968 198515 3 63881 45538.0 82224.0 116 83.0 \n", "1969 198514 3 134545 114400.0 154690.0 244 207.0 \n", "1970 198513 3 197206 176080.0 218332.0 357 319.0 \n", "1971 198512 3 245240 223304.0 267176.0 445 405.0 \n", "1972 198511 3 276205 252399.0 300011.0 501 458.0 \n", "1973 198510 3 353231 326279.0 380183.0 640 591.0 \n", "1974 198509 3 369895 341109.0 398681.0 670 618.0 \n", "1975 198508 3 389886 359529.0 420243.0 707 652.0 \n", "1976 198507 3 471852 432599.0 511105.0 855 784.0 \n", "1977 198506 3 565825 518011.0 613639.0 1026 939.0 \n", "1978 198505 3 637302 592795.0 681809.0 1155 1074.0 \n", "1979 198504 3 424937 390794.0 459080.0 770 708.0 \n", "1980 198503 3 213901 174689.0 253113.0 388 317.0 \n", "1981 198502 3 97586 80949.0 114223.0 177 147.0 \n", "1982 198501 3 85489 65918.0 105060.0 155 120.0 \n", "1983 198452 3 84830 60602.0 109058.0 154 110.0 \n", "1984 198451 3 101726 80242.0 123210.0 185 146.0 \n", "1985 198450 3 123680 101401.0 145959.0 225 184.0 \n", "1986 198449 3 101073 81684.0 120462.0 184 149.0 \n", "1987 198448 3 78620 60634.0 96606.0 143 110.0 \n", "1988 198447 3 72029 54274.0 89784.0 131 99.0 \n", "1989 198446 3 87330 67686.0 106974.0 159 123.0 \n", "1990 198445 3 135223 101414.0 169032.0 246 184.0 \n", "1991 198444 3 68422 20056.0 116788.0 125 37.0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 256.0 FR France \n", "1 398.0 FR France \n", "2 375.0 FR France \n", "3 264.0 FR France \n", "4 199.0 FR France \n", "5 159.0 FR France \n", "6 114.0 FR France \n", "7 78.0 FR France \n", "8 61.0 FR France \n", "9 80.0 FR France \n", "10 82.0 FR France \n", "11 82.0 FR France \n", "12 72.0 FR France \n", "13 69.0 FR France \n", "14 51.0 FR France \n", "15 39.0 FR France \n", "16 26.0 FR France \n", "17 18.0 FR France \n", "18 15.0 FR France \n", "19 16.0 FR France \n", "20 25.0 FR France \n", "21 27.0 FR France \n", "22 29.0 FR France \n", "23 38.0 FR France \n", "24 43.0 FR France \n", "25 64.0 FR France \n", "26 52.0 FR France \n", "27 41.0 FR France \n", "28 35.0 FR France \n", "29 32.0 FR France \n", "... ... ... ... \n", "1962 59.0 FR France \n", "1963 64.0 FR France \n", "1964 97.0 FR France \n", "1965 93.0 FR France \n", "1966 80.0 FR France \n", "1967 116.0 FR France \n", "1968 149.0 FR France \n", "1969 281.0 FR France \n", "1970 395.0 FR France \n", "1971 485.0 FR France \n", "1972 544.0 FR France \n", "1973 689.0 FR France \n", "1974 722.0 FR France \n", "1975 762.0 FR France \n", "1976 926.0 FR France \n", "1977 1113.0 FR France \n", "1978 1236.0 FR France \n", "1979 832.0 FR France \n", "1980 459.0 FR France \n", "1981 207.0 FR France \n", "1982 190.0 FR France \n", "1983 198.0 FR France \n", "1984 224.0 FR France \n", "1985 266.0 FR France \n", "1986 219.0 FR France \n", "1987 176.0 FR France \n", "1988 163.0 FR France \n", "1989 195.0 FR France \n", "1990 308.0 FR France \n", "1991 213.0 FR France \n", "\n", "[1991 rows x 10 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ " data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second.\n", "\n", "This is OK except for one pair of consecutive periods between which\n", "a whole week is missing.\n", "\n", "We recognize the dates: it's the week without observations that we\n", "have deleted earlier!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A zoom on the last few years shows more clearly that the peaks are situated in winter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in winter, near the transition\n", "between calendar years, we define the reference period for the annual\n", "incidence from August 1st of year $N$ to August 1st of year $N+1$. We\n", "label this period as year $N+1$ because the peak is always located in\n", "year $N+1$. The very low incidence in summer ensures that the arbitrariness\n", "of the choice of reference period has no impact on our conclusions.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an\n", "integer number of weeks. Therefore we modify our reference period a bit:\n", "instead of August 1st, we use the first day of the week containing August 1st.\n", "\n", "A final detail: the dataset starts in October 1984, the first peak is thus\n", "incomplete, We start the analysis with the first full peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain August 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a histogram clearly shows the few very strong epidemics, which affect about 10% of the French population,\n", "but are rare: there were three of them in the course of 35 years. The typical epidemic affects only half as many people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }