{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence du syndrome grippal" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données de l'incidence du syndrome grippal sont disponibles du site Web du [Réseau Sentinelles](http://www.sentiweb.fr/). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée. Nous téléchargeons toujours le jeu de données complet, qui commence en 1984 et se termine avec une semaine récente." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"inc-3-PAY-ds2.csv\" # télécharger les données localement" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes données [sur le site d'origine](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Nom de colonne | Libellé de colonne |\n", "|----------------|-----------------------------------------------------------------------------------------------------------------------------------|\n", "| week | Semaine calendaire (ISO 8601) |\n", "| indicator | Code de l'indicateur de surveillance |\n", "| inc | Estimation de l'incidence de consultations en nombre de cas |\n", "| inc_low | Estimation de la borne inférieure de l'IC95% du nombre de cas de consultation |\n", "| inc_up | Estimation de la borne supérieure de l'IC95% du nombre de cas de consultation |\n", "| inc100 | Estimation du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_low | Estimation de la borne inférieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_up | Estimation de la borne supérieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| geo_insee | Code de la zone géographique concernée (Code INSEE) http://www.insee.fr/fr/methodes/nomenclatures/cog/ |\n", "| geo_name | Libellé de la zone géographique (ce libellé peut être modifié sans préavis) |\n", "\n", "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant `skiprows=1`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
0201601FR3422636548970355567555
1201602FR3446176851413378217958
2201603FR3752771168392466630129102
3201604FR3148473228160355136591246210
4201605FR3178963275191630166296294255
5201606FR3194921299200637189205308290
6201607FR3204604314210495198713323305
7201608FR3200681308206577194785317299
8201609FR3213380328219353207407337318
9201610FR3248576382254901242251391372
10201611FR3279636429286366272906440419
11201612FR3274533421281219267847432411
12201613FR3173684267179077168291275258
13201614FR3113384174117869108899181167
14201615FR3603889363680570969888
15201616FR3333975135849309455547
16201617FR3269814129219247434538
17201618FR3152742316926136222621
18201619FR3143582215959127572420
19201620FR39853151119085161713
20201621FR39944151127786111713
21201622FR366481077415555129
22201623FR36151972025100118
23201624FR362641073195209118
24201625FR35345863314359107
25201626FR36027970984956118
26201627FR35202862264178106
27201628FR3332054160248064
28201629FR3405665020309285
29201630FR3354054454262674
..............................
484202516FR3260063928568234444335
485202517FR3201113022320179023327
486202518FR3169672518996149382822
487202519FR3177542619794157143023
488202520FR3232513525546209563831
489202521FR3228983425167206293831
490202522FR3172542619251152572923
491202523FR3198763022004177483326
492202524FR3192652921368171623226
493202525FR3192562921342171703226
494202526FR3184742820532164163124
495202527FR3177032619725156812923
496202528FR3186072820753164613125
497202529FR3159382417990138862721
498202530FR3188962821165166273225
499202531FR3203763022799179533427
500202532FR3199383022479173973426
501202533FR3130051915114108962316
502202534FR3198443022312173763326
503202535FR3239003626415213853932
504202536FR3302974533247273475041
505202537FR3364445439393334955950
506202538FR3545688158152509848776
507202539FR3603329064237564279684
508202540FR369162103731266519810997
509202541FR3752211127937371069118106
510202542FR368201102722356416710896
511202543FR3547838258583509838776
512202544FR3546908258439509418776
513202545FR3482117251771446517767
\n", "

514 rows × 9 columns

\n", "
" ], "text/plain": [ " week geo_insee indicator inc inc100 inc_up inc_low inc100_up \\\n", "0 201601 FR 3 42263 65 48970 35556 75 \n", "1 201602 FR 3 44617 68 51413 37821 79 \n", "2 201603 FR 3 75277 116 83924 66630 129 \n", "3 201604 FR 3 148473 228 160355 136591 246 \n", "4 201605 FR 3 178963 275 191630 166296 294 \n", "5 201606 FR 3 194921 299 200637 189205 308 \n", "6 201607 FR 3 204604 314 210495 198713 323 \n", "7 201608 FR 3 200681 308 206577 194785 317 \n", "8 201609 FR 3 213380 328 219353 207407 337 \n", "9 201610 FR 3 248576 382 254901 242251 391 \n", "10 201611 FR 3 279636 429 286366 272906 440 \n", "11 201612 FR 3 274533 421 281219 267847 432 \n", "12 201613 FR 3 173684 267 179077 168291 275 \n", "13 201614 FR 3 113384 174 117869 108899 181 \n", "14 201615 FR 3 60388 93 63680 57096 98 \n", "15 201616 FR 3 33397 51 35849 30945 55 \n", "16 201617 FR 3 26981 41 29219 24743 45 \n", "17 201618 FR 3 15274 23 16926 13622 26 \n", "18 201619 FR 3 14358 22 15959 12757 24 \n", "19 201620 FR 3 9853 15 11190 8516 17 \n", "20 201621 FR 3 9944 15 11277 8611 17 \n", "21 201622 FR 3 6648 10 7741 5555 12 \n", "22 201623 FR 3 6151 9 7202 5100 11 \n", "23 201624 FR 3 6264 10 7319 5209 11 \n", "24 201625 FR 3 5345 8 6331 4359 10 \n", "25 201626 FR 3 6027 9 7098 4956 11 \n", "26 201627 FR 3 5202 8 6226 4178 10 \n", "27 201628 FR 3 3320 5 4160 2480 6 \n", "28 201629 FR 3 4056 6 5020 3092 8 \n", "29 201630 FR 3 3540 5 4454 2626 7 \n", ".. ... ... ... ... ... ... ... ... \n", "484 202516 FR 3 26006 39 28568 23444 43 \n", "485 202517 FR 3 20111 30 22320 17902 33 \n", "486 202518 FR 3 16967 25 18996 14938 28 \n", "487 202519 FR 3 17754 26 19794 15714 30 \n", "488 202520 FR 3 23251 35 25546 20956 38 \n", "489 202521 FR 3 22898 34 25167 20629 38 \n", "490 202522 FR 3 17254 26 19251 15257 29 \n", "491 202523 FR 3 19876 30 22004 17748 33 \n", "492 202524 FR 3 19265 29 21368 17162 32 \n", "493 202525 FR 3 19256 29 21342 17170 32 \n", "494 202526 FR 3 18474 28 20532 16416 31 \n", "495 202527 FR 3 17703 26 19725 15681 29 \n", "496 202528 FR 3 18607 28 20753 16461 31 \n", "497 202529 FR 3 15938 24 17990 13886 27 \n", "498 202530 FR 3 18896 28 21165 16627 32 \n", "499 202531 FR 3 20376 30 22799 17953 34 \n", "500 202532 FR 3 19938 30 22479 17397 34 \n", "501 202533 FR 3 13005 19 15114 10896 23 \n", "502 202534 FR 3 19844 30 22312 17376 33 \n", "503 202535 FR 3 23900 36 26415 21385 39 \n", "504 202536 FR 3 30297 45 33247 27347 50 \n", "505 202537 FR 3 36444 54 39393 33495 59 \n", "506 202538 FR 3 54568 81 58152 50984 87 \n", "507 202539 FR 3 60332 90 64237 56427 96 \n", "508 202540 FR 3 69162 103 73126 65198 109 \n", "509 202541 FR 3 75221 112 79373 71069 118 \n", "510 202542 FR 3 68201 102 72235 64167 108 \n", "511 202543 FR 3 54783 82 58583 50983 87 \n", "512 202544 FR 3 54690 82 58439 50941 87 \n", "513 202545 FR 3 48211 72 51771 44651 77 \n", "\n", " inc100_low \n", "0 55 \n", "1 58 \n", "2 102 \n", "3 210 \n", "4 255 \n", "5 290 \n", "6 305 \n", "7 299 \n", "8 318 \n", "9 372 \n", "10 419 \n", "11 411 \n", "12 258 \n", "13 167 \n", "14 88 \n", "15 47 \n", "16 38 \n", "17 21 \n", "18 20 \n", "19 13 \n", "20 13 \n", "21 9 \n", "22 8 \n", "23 8 \n", "24 7 \n", "25 8 \n", "26 6 \n", "27 4 \n", "28 5 \n", "29 4 \n", ".. ... \n", "484 35 \n", "485 27 \n", "486 22 \n", "487 23 \n", "488 31 \n", "489 31 \n", "490 23 \n", "491 26 \n", "492 26 \n", "493 26 \n", "494 24 \n", "495 23 \n", "496 25 \n", "497 21 \n", "498 25 \n", "499 27 \n", "500 26 \n", "501 16 \n", "502 26 \n", "503 32 \n", "504 41 \n", "505 50 \n", "506 76 \n", "507 84 \n", "508 97 \n", "509 106 \n", "510 96 \n", "511 76 \n", "512 76 \n", "513 67 \n", "\n", "[514 rows x 9 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y a-t-il des points manquants dans ce jeux de données ? Oui, la semaine 19 de l'année 1989 n'a pas de valeurs associées." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [week, geo_insee, indicator, inc, inc100, inc_up, inc_low, inc100_up, inc100_low]\n", "Index: []" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous éliminons ce point, ce qui n'a pas d'impact fort sur notre analyse qui est assez simple." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
0201601FR3422636548970355567555
1201602FR3446176851413378217958
2201603FR3752771168392466630129102
3201604FR3148473228160355136591246210
4201605FR3178963275191630166296294255
5201606FR3194921299200637189205308290
6201607FR3204604314210495198713323305
7201608FR3200681308206577194785317299
8201609FR3213380328219353207407337318
9201610FR3248576382254901242251391372
10201611FR3279636429286366272906440419
11201612FR3274533421281219267847432411
12201613FR3173684267179077168291275258
13201614FR3113384174117869108899181167
14201615FR3603889363680570969888
15201616FR3333975135849309455547
16201617FR3269814129219247434538
17201618FR3152742316926136222621
18201619FR3143582215959127572420
19201620FR39853151119085161713
20201621FR39944151127786111713
21201622FR366481077415555129
22201623FR36151972025100118
23201624FR362641073195209118
24201625FR35345863314359107
25201626FR36027970984956118
26201627FR35202862264178106
27201628FR3332054160248064
28201629FR3405665020309285
29201630FR3354054454262674
..............................
484202516FR3260063928568234444335
485202517FR3201113022320179023327
486202518FR3169672518996149382822
487202519FR3177542619794157143023
488202520FR3232513525546209563831
489202521FR3228983425167206293831
490202522FR3172542619251152572923
491202523FR3198763022004177483326
492202524FR3192652921368171623226
493202525FR3192562921342171703226
494202526FR3184742820532164163124
495202527FR3177032619725156812923
496202528FR3186072820753164613125
497202529FR3159382417990138862721
498202530FR3188962821165166273225
499202531FR3203763022799179533427
500202532FR3199383022479173973426
501202533FR3130051915114108962316
502202534FR3198443022312173763326
503202535FR3239003626415213853932
504202536FR3302974533247273475041
505202537FR3364445439393334955950
506202538FR3545688158152509848776
507202539FR3603329064237564279684
508202540FR369162103731266519810997
509202541FR3752211127937371069118106
510202542FR368201102722356416710896
511202543FR3547838258583509838776
512202544FR3546908258439509418776
513202545FR3482117251771446517767
\n", "

514 rows × 9 columns

\n", "
" ], "text/plain": [ " week geo_insee indicator inc inc100 inc_up inc_low inc100_up \\\n", "0 201601 FR 3 42263 65 48970 35556 75 \n", "1 201602 FR 3 44617 68 51413 37821 79 \n", "2 201603 FR 3 75277 116 83924 66630 129 \n", "3 201604 FR 3 148473 228 160355 136591 246 \n", "4 201605 FR 3 178963 275 191630 166296 294 \n", "5 201606 FR 3 194921 299 200637 189205 308 \n", "6 201607 FR 3 204604 314 210495 198713 323 \n", "7 201608 FR 3 200681 308 206577 194785 317 \n", "8 201609 FR 3 213380 328 219353 207407 337 \n", "9 201610 FR 3 248576 382 254901 242251 391 \n", "10 201611 FR 3 279636 429 286366 272906 440 \n", "11 201612 FR 3 274533 421 281219 267847 432 \n", "12 201613 FR 3 173684 267 179077 168291 275 \n", "13 201614 FR 3 113384 174 117869 108899 181 \n", "14 201615 FR 3 60388 93 63680 57096 98 \n", "15 201616 FR 3 33397 51 35849 30945 55 \n", "16 201617 FR 3 26981 41 29219 24743 45 \n", "17 201618 FR 3 15274 23 16926 13622 26 \n", "18 201619 FR 3 14358 22 15959 12757 24 \n", "19 201620 FR 3 9853 15 11190 8516 17 \n", "20 201621 FR 3 9944 15 11277 8611 17 \n", "21 201622 FR 3 6648 10 7741 5555 12 \n", "22 201623 FR 3 6151 9 7202 5100 11 \n", "23 201624 FR 3 6264 10 7319 5209 11 \n", "24 201625 FR 3 5345 8 6331 4359 10 \n", "25 201626 FR 3 6027 9 7098 4956 11 \n", "26 201627 FR 3 5202 8 6226 4178 10 \n", "27 201628 FR 3 3320 5 4160 2480 6 \n", "28 201629 FR 3 4056 6 5020 3092 8 \n", "29 201630 FR 3 3540 5 4454 2626 7 \n", ".. ... ... ... ... ... ... ... ... \n", "484 202516 FR 3 26006 39 28568 23444 43 \n", "485 202517 FR 3 20111 30 22320 17902 33 \n", "486 202518 FR 3 16967 25 18996 14938 28 \n", "487 202519 FR 3 17754 26 19794 15714 30 \n", "488 202520 FR 3 23251 35 25546 20956 38 \n", "489 202521 FR 3 22898 34 25167 20629 38 \n", "490 202522 FR 3 17254 26 19251 15257 29 \n", "491 202523 FR 3 19876 30 22004 17748 33 \n", "492 202524 FR 3 19265 29 21368 17162 32 \n", "493 202525 FR 3 19256 29 21342 17170 32 \n", "494 202526 FR 3 18474 28 20532 16416 31 \n", "495 202527 FR 3 17703 26 19725 15681 29 \n", "496 202528 FR 3 18607 28 20753 16461 31 \n", "497 202529 FR 3 15938 24 17990 13886 27 \n", "498 202530 FR 3 18896 28 21165 16627 32 \n", "499 202531 FR 3 20376 30 22799 17953 34 \n", "500 202532 FR 3 19938 30 22479 17397 34 \n", "501 202533 FR 3 13005 19 15114 10896 23 \n", "502 202534 FR 3 19844 30 22312 17376 33 \n", "503 202535 FR 3 23900 36 26415 21385 39 \n", "504 202536 FR 3 30297 45 33247 27347 50 \n", "505 202537 FR 3 36444 54 39393 33495 59 \n", "506 202538 FR 3 54568 81 58152 50984 87 \n", "507 202539 FR 3 60332 90 64237 56427 96 \n", "508 202540 FR 3 69162 103 73126 65198 109 \n", "509 202541 FR 3 75221 112 79373 71069 118 \n", "510 202542 FR 3 68201 102 72235 64167 108 \n", "511 202543 FR 3 54783 82 58583 50983 87 \n", "512 202544 FR 3 54690 82 58439 50941 87 \n", "513 202545 FR 3 48211 72 51771 44651 77 \n", "\n", " inc100_low \n", "0 55 \n", "1 58 \n", "2 102 \n", "3 210 \n", "4 255 \n", "5 290 \n", "6 305 \n", "7 299 \n", "8 318 \n", "9 372 \n", "10 419 \n", "11 411 \n", "12 258 \n", "13 167 \n", "14 88 \n", "15 47 \n", "16 38 \n", "17 21 \n", "18 20 \n", "19 13 \n", "20 13 \n", "21 9 \n", "22 8 \n", "23 8 \n", "24 7 \n", "25 8 \n", "26 6 \n", "27 4 \n", "28 5 \n", "29 4 \n", ".. ... \n", "484 35 \n", "485 27 \n", "486 22 \n", "487 23 \n", "488 31 \n", "489 31 \n", "490 23 \n", "491 26 \n", "492 26 \n", "493 26 \n", "494 24 \n", "495 23 \n", "496 25 \n", "497 21 \n", "498 25 \n", "499 27 \n", "500 26 \n", "501 16 \n", "502 26 \n", "503 32 \n", "504 41 \n", "505 50 \n", "506 76 \n", "507 84 \n", "508 97 \n", "509 106 \n", "510 96 \n", "511 76 \n", "512 76 \n", "513 67 \n", "\n", "[514 rows x 9 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nos données utilisent une convention inhabituelle: le numéro de\n", "semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète.\n", " \n", "Un deuxième problème est que Pandas ne comprend pas les numéros de\n", "semaine. Il faut lui fournir les dates de début et de fin de\n", "semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous\n", "écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont\n", "dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation\n", "comme nouvel index de notre jeux de données. Ceci en fait\n", "une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans\n", "le sens chronologique." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous vérifions la cohérence des données. Entre la fin d'une période et\n", "le début de la période qui suit, la différence temporelle doit être\n", "zéro, ou au moins très faible. Nous laissons une \"marge d'erreur\"\n", "d'une seconde.\n", "\n", "Ceci s'avère tout à fait juste sauf pour deux périodes consécutives\n", "entre lesquelles il manque une semaine.\n", "\n", "Nous reconnaissons ces dates: c'est la semaine sans observations\n", "que nous avions supprimées !" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un premier regard sur les données !" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un zoom sur les dernières années montre mieux la situation des pics en hiver. Le creux des incidences se trouve en été." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Etude de l'incidence annuelle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Etant donné que le pic de l'épidémie se situe en hiver, à cheval\n", "entre deux années civiles, nous définissons la période de référence\n", "entre deux minima de l'incidence, du 1er août de l'année $N$ au\n", "1er août de l'année $N+1$.\n", "\n", "Notre tâche est un peu compliquée par le fait que l'année ne comporte\n", "pas un nombre entier de semaines. Nous modifions donc un peu nos périodes\n", "de référence: à la place du 1er août de chaque année, nous utilisons le\n", "premier jour de la semaine qui contient le 1er août.\n", "\n", "Comme l'incidence de syndrome grippal est très faible en été, cette\n", "modification ne risque pas de fausser nos conclusions.\n", "\n", "Encore un petit détail: les données commencent an octobre 1984, ce qui\n", "rend la première année incomplète. Nous commençons donc l'analyse en 1985." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En partant de cette liste des semaines qui contiennent un 1er août, nous obtenons nos intervalles d'environ un an comme les périodes entre deux semaines adjacentes dans cette liste. Nous calculons les sommes des incidences hebdomadaires pour toutes ces périodes.\n", "\n", "Nous vérifions également que ces périodes contiennent entre 51 et 52 semaines, pour nous protéger contre des éventuelles erreurs dans notre code." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m first_august_week[1:]):\n\u001b[1;32m 5\u001b[0m \u001b[0mone_year\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msorted_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'inc'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mweek1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mweek2\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mabs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mone_year\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m52\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0myearly_incidence\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mone_year\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0myear\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mweek2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myear\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: " ] } ], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici les incidences annuelles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Une liste triée permet de plus facilement répérer les valeurs les plus élevées (à la fin)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enfin, un histogramme montre bien que les épidémies fortes, qui touchent environ 10% de la population\n", " française, sont assez rares: il y en eu trois au cours des 35 dernières années." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }