{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence du syndrome grippal" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données de l'incidence du syndrome grippal sont disponibles du site Web du [Réseau Sentinelles](http://www.sentiweb.fr/). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée. Nous téléchargeons toujours le jeu de données complet, qui commence en 1984 et se termine avec une semaine récente." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dans le cadre de cette exercice et afin de garantir l'accès aux données de cette études, les données sous forme d'un fichier .cvs sauvegardé localement dans le même répertoire que ce fichier d'analyse." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "data_url_local = \"syndromes_grippaux_hebdo.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes données [sur le site d'origine](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Nom de colonne | Libellé de colonne |\n", "|----------------|-----------------------------------------------------------------------------------------------------------------------------------|\n", "| week | Semaine calendaire (ISO 8601) |\n", "| indicator | Code de l'indicateur de surveillance |\n", "| inc | Estimation de l'incidence de consultations en nombre de cas |\n", "| inc_low | Estimation de la borne inférieure de l'IC95% du nombre de cas de consultation |\n", "| inc_up | Estimation de la borne supérieure de l'IC95% du nombre de cas de consultation |\n", "| inc100 | Estimation du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_low | Estimation de la borne inférieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_up | Estimation de la borne supérieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| geo_insee | Code de la zone géographique concernée (Code INSEE) http://www.insee.fr/fr/methodes/nomenclatures/cog/ |\n", "| geo_name | Libellé de la zone géographique (ce libellé peut être modifié sans préavis) |\n", "\n", "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant `skiprows=1`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
0201601FR3422636548970355567555
1201602FR3446176851413378217958
2201603FR3752771168392466630129102
3201604FR3148473228160355136591246210
4201605FR3178963275191630166296294255
5201606FR3194921299200637189205308290
6201607FR3204604314210495198713323305
7201608FR3200681308206577194785317299
8201609FR3213380328219353207407337318
9201610FR3248576382254901242251391372
10201611FR3279636429286366272906440419
11201612FR3274533421281219267847432411
12201613FR3173684267179077168291275258
13201614FR3113384174117869108899181167
14201615FR3603889363680570969888
15201616FR3333975135849309455547
16201617FR3269814129219247434538
17201618FR3152742316926136222621
18201619FR3143582215959127572420
19201620FR39853151119085161713
20201621FR39944151127786111713
21201622FR366481077415555129
22201623FR36151972025100118
23201624FR362641073195209118
24201625FR35345863314359107
25201626FR36027970984956118
26201627FR35202862264178106
27201628FR3332054160248064
28201629FR3405665020309285
29201630FR3354054454262674
..............................
452202436FR3308524633372283325042
453202437FR3467467049889436037565
454202438FR3756531137966571641119107
455202439FR3711171077504467190113101
456202440FR3773041168139273216122110
457202441FR3882001329237384027138126
458202442FR369988105736686630811099
459202443FR3539948157044509448676
460202444FR3380665740650354826153
461202445FR3548178257778518568778
462202446FR3548118257692519308678
463202447FR3780191178143674602122112
464202448FR3844511278800380899132121
465202449FR3120237180124480115994187174
466202450FR3156576235161401151751242228
467202451FR3230219345236155224283354336
468202452FR3184837277190936178738286268
469202501FR3231549345248471214627371320
470202502FR3257248384271504242992405363
471202503FR3252774377266629238919398356
472202504FR3350040522367198332882548497
473202505FR3334394499350373318415523475
474202506FR3273516408288876258156431385
475202507FR3208953312221917195989331292
476202508FR3136020203147216124824220186
477202509FR3845291269406674992140112
478202510FR36033690676225305010179
479202511FR35946789667825215210078
480202512FR3530917960086460969069
481202513FR3432916550549360337554
\n", "

482 rows × 9 columns

\n", "
" ], "text/plain": [ " week geo_insee indicator inc inc100 inc_up inc_low inc100_up \\\n", "0 201601 FR 3 42263 65 48970 35556 75 \n", "1 201602 FR 3 44617 68 51413 37821 79 \n", "2 201603 FR 3 75277 116 83924 66630 129 \n", "3 201604 FR 3 148473 228 160355 136591 246 \n", "4 201605 FR 3 178963 275 191630 166296 294 \n", "5 201606 FR 3 194921 299 200637 189205 308 \n", "6 201607 FR 3 204604 314 210495 198713 323 \n", "7 201608 FR 3 200681 308 206577 194785 317 \n", "8 201609 FR 3 213380 328 219353 207407 337 \n", "9 201610 FR 3 248576 382 254901 242251 391 \n", "10 201611 FR 3 279636 429 286366 272906 440 \n", "11 201612 FR 3 274533 421 281219 267847 432 \n", "12 201613 FR 3 173684 267 179077 168291 275 \n", "13 201614 FR 3 113384 174 117869 108899 181 \n", "14 201615 FR 3 60388 93 63680 57096 98 \n", "15 201616 FR 3 33397 51 35849 30945 55 \n", "16 201617 FR 3 26981 41 29219 24743 45 \n", "17 201618 FR 3 15274 23 16926 13622 26 \n", "18 201619 FR 3 14358 22 15959 12757 24 \n", "19 201620 FR 3 9853 15 11190 8516 17 \n", "20 201621 FR 3 9944 15 11277 8611 17 \n", "21 201622 FR 3 6648 10 7741 5555 12 \n", "22 201623 FR 3 6151 9 7202 5100 11 \n", "23 201624 FR 3 6264 10 7319 5209 11 \n", "24 201625 FR 3 5345 8 6331 4359 10 \n", "25 201626 FR 3 6027 9 7098 4956 11 \n", "26 201627 FR 3 5202 8 6226 4178 10 \n", "27 201628 FR 3 3320 5 4160 2480 6 \n", "28 201629 FR 3 4056 6 5020 3092 8 \n", "29 201630 FR 3 3540 5 4454 2626 7 \n", ".. ... ... ... ... ... ... ... ... \n", "452 202436 FR 3 30852 46 33372 28332 50 \n", "453 202437 FR 3 46746 70 49889 43603 75 \n", "454 202438 FR 3 75653 113 79665 71641 119 \n", "455 202439 FR 3 71117 107 75044 67190 113 \n", "456 202440 FR 3 77304 116 81392 73216 122 \n", "457 202441 FR 3 88200 132 92373 84027 138 \n", "458 202442 FR 3 69988 105 73668 66308 110 \n", "459 202443 FR 3 53994 81 57044 50944 86 \n", "460 202444 FR 3 38066 57 40650 35482 61 \n", "461 202445 FR 3 54817 82 57778 51856 87 \n", "462 202446 FR 3 54811 82 57692 51930 86 \n", "463 202447 FR 3 78019 117 81436 74602 122 \n", "464 202448 FR 3 84451 127 88003 80899 132 \n", "465 202449 FR 3 120237 180 124480 115994 187 \n", "466 202450 FR 3 156576 235 161401 151751 242 \n", "467 202451 FR 3 230219 345 236155 224283 354 \n", "468 202452 FR 3 184837 277 190936 178738 286 \n", "469 202501 FR 3 231549 345 248471 214627 371 \n", "470 202502 FR 3 257248 384 271504 242992 405 \n", "471 202503 FR 3 252774 377 266629 238919 398 \n", "472 202504 FR 3 350040 522 367198 332882 548 \n", "473 202505 FR 3 334394 499 350373 318415 523 \n", "474 202506 FR 3 273516 408 288876 258156 431 \n", "475 202507 FR 3 208953 312 221917 195989 331 \n", "476 202508 FR 3 136020 203 147216 124824 220 \n", "477 202509 FR 3 84529 126 94066 74992 140 \n", "478 202510 FR 3 60336 90 67622 53050 101 \n", "479 202511 FR 3 59467 89 66782 52152 100 \n", "480 202512 FR 3 53091 79 60086 46096 90 \n", "481 202513 FR 3 43291 65 50549 36033 75 \n", "\n", " inc100_low \n", "0 55 \n", "1 58 \n", "2 102 \n", "3 210 \n", "4 255 \n", "5 290 \n", "6 305 \n", "7 299 \n", "8 318 \n", "9 372 \n", "10 419 \n", "11 411 \n", "12 258 \n", "13 167 \n", "14 88 \n", "15 47 \n", "16 38 \n", "17 21 \n", "18 20 \n", "19 13 \n", "20 13 \n", "21 9 \n", "22 8 \n", "23 8 \n", "24 7 \n", "25 8 \n", "26 6 \n", "27 4 \n", "28 5 \n", "29 4 \n", ".. ... \n", "452 42 \n", "453 65 \n", "454 107 \n", "455 101 \n", "456 110 \n", "457 126 \n", "458 99 \n", "459 76 \n", "460 53 \n", "461 78 \n", "462 78 \n", "463 112 \n", "464 121 \n", "465 174 \n", "466 228 \n", "467 336 \n", "468 268 \n", "469 320 \n", "470 363 \n", "471 356 \n", "472 497 \n", "473 475 \n", "474 385 \n", "475 292 \n", "476 186 \n", "477 112 \n", "478 79 \n", "479 78 \n", "480 69 \n", "481 54 \n", "\n", "[482 rows x 9 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data_local = pd.read_csv(data_url_local, skiprows=1)\n", "raw_data_local" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y a-t-il des points manquants dans ce jeux de données ? Oui, la semaine 19 de l'année 1989 n'a pas de valeurs associées." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [week, geo_insee, indicator, inc, inc100, inc_up, inc_low, inc100_up, inc100_low]\n", "Index: []" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data_local[raw_data_local.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous éliminons ce point, ce qui n'a pas d'impact fort sur notre analyse qui est assez simple." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
0201601FR3422636548970355567555
1201602FR3446176851413378217958
2201603FR3752771168392466630129102
3201604FR3148473228160355136591246210
4201605FR3178963275191630166296294255
5201606FR3194921299200637189205308290
6201607FR3204604314210495198713323305
7201608FR3200681308206577194785317299
8201609FR3213380328219353207407337318
9201610FR3248576382254901242251391372
10201611FR3279636429286366272906440419
11201612FR3274533421281219267847432411
12201613FR3173684267179077168291275258
13201614FR3113384174117869108899181167
14201615FR3603889363680570969888
15201616FR3333975135849309455547
16201617FR3269814129219247434538
17201618FR3152742316926136222621
18201619FR3143582215959127572420
19201620FR39853151119085161713
20201621FR39944151127786111713
21201622FR366481077415555129
22201623FR36151972025100118
23201624FR362641073195209118
24201625FR35345863314359107
25201626FR36027970984956118
26201627FR35202862264178106
27201628FR3332054160248064
28201629FR3405665020309285
29201630FR3354054454262674
..............................
452202436FR3308524633372283325042
453202437FR3467467049889436037565
454202438FR3756531137966571641119107
455202439FR3711171077504467190113101
456202440FR3773041168139273216122110
457202441FR3882001329237384027138126
458202442FR369988105736686630811099
459202443FR3539948157044509448676
460202444FR3380665740650354826153
461202445FR3548178257778518568778
462202446FR3548118257692519308678
463202447FR3780191178143674602122112
464202448FR3844511278800380899132121
465202449FR3120237180124480115994187174
466202450FR3156576235161401151751242228
467202451FR3230219345236155224283354336
468202452FR3184837277190936178738286268
469202501FR3231549345248471214627371320
470202502FR3257248384271504242992405363
471202503FR3252774377266629238919398356
472202504FR3350040522367198332882548497
473202505FR3334394499350373318415523475
474202506FR3273516408288876258156431385
475202507FR3208953312221917195989331292
476202508FR3136020203147216124824220186
477202509FR3845291269406674992140112
478202510FR36033690676225305010179
479202511FR35946789667825215210078
480202512FR3530917960086460969069
481202513FR3432916550549360337554
\n", "

482 rows × 9 columns

\n", "
" ], "text/plain": [ " week geo_insee indicator inc inc100 inc_up inc_low inc100_up \\\n", "0 201601 FR 3 42263 65 48970 35556 75 \n", "1 201602 FR 3 44617 68 51413 37821 79 \n", "2 201603 FR 3 75277 116 83924 66630 129 \n", "3 201604 FR 3 148473 228 160355 136591 246 \n", "4 201605 FR 3 178963 275 191630 166296 294 \n", "5 201606 FR 3 194921 299 200637 189205 308 \n", "6 201607 FR 3 204604 314 210495 198713 323 \n", "7 201608 FR 3 200681 308 206577 194785 317 \n", "8 201609 FR 3 213380 328 219353 207407 337 \n", "9 201610 FR 3 248576 382 254901 242251 391 \n", "10 201611 FR 3 279636 429 286366 272906 440 \n", "11 201612 FR 3 274533 421 281219 267847 432 \n", "12 201613 FR 3 173684 267 179077 168291 275 \n", "13 201614 FR 3 113384 174 117869 108899 181 \n", "14 201615 FR 3 60388 93 63680 57096 98 \n", "15 201616 FR 3 33397 51 35849 30945 55 \n", "16 201617 FR 3 26981 41 29219 24743 45 \n", "17 201618 FR 3 15274 23 16926 13622 26 \n", "18 201619 FR 3 14358 22 15959 12757 24 \n", "19 201620 FR 3 9853 15 11190 8516 17 \n", "20 201621 FR 3 9944 15 11277 8611 17 \n", "21 201622 FR 3 6648 10 7741 5555 12 \n", "22 201623 FR 3 6151 9 7202 5100 11 \n", "23 201624 FR 3 6264 10 7319 5209 11 \n", "24 201625 FR 3 5345 8 6331 4359 10 \n", "25 201626 FR 3 6027 9 7098 4956 11 \n", "26 201627 FR 3 5202 8 6226 4178 10 \n", "27 201628 FR 3 3320 5 4160 2480 6 \n", "28 201629 FR 3 4056 6 5020 3092 8 \n", "29 201630 FR 3 3540 5 4454 2626 7 \n", ".. ... ... ... ... ... ... ... ... \n", "452 202436 FR 3 30852 46 33372 28332 50 \n", "453 202437 FR 3 46746 70 49889 43603 75 \n", "454 202438 FR 3 75653 113 79665 71641 119 \n", "455 202439 FR 3 71117 107 75044 67190 113 \n", "456 202440 FR 3 77304 116 81392 73216 122 \n", "457 202441 FR 3 88200 132 92373 84027 138 \n", "458 202442 FR 3 69988 105 73668 66308 110 \n", "459 202443 FR 3 53994 81 57044 50944 86 \n", "460 202444 FR 3 38066 57 40650 35482 61 \n", "461 202445 FR 3 54817 82 57778 51856 87 \n", "462 202446 FR 3 54811 82 57692 51930 86 \n", "463 202447 FR 3 78019 117 81436 74602 122 \n", "464 202448 FR 3 84451 127 88003 80899 132 \n", "465 202449 FR 3 120237 180 124480 115994 187 \n", "466 202450 FR 3 156576 235 161401 151751 242 \n", "467 202451 FR 3 230219 345 236155 224283 354 \n", "468 202452 FR 3 184837 277 190936 178738 286 \n", "469 202501 FR 3 231549 345 248471 214627 371 \n", "470 202502 FR 3 257248 384 271504 242992 405 \n", "471 202503 FR 3 252774 377 266629 238919 398 \n", "472 202504 FR 3 350040 522 367198 332882 548 \n", "473 202505 FR 3 334394 499 350373 318415 523 \n", "474 202506 FR 3 273516 408 288876 258156 431 \n", "475 202507 FR 3 208953 312 221917 195989 331 \n", "476 202508 FR 3 136020 203 147216 124824 220 \n", "477 202509 FR 3 84529 126 94066 74992 140 \n", "478 202510 FR 3 60336 90 67622 53050 101 \n", "479 202511 FR 3 59467 89 66782 52152 100 \n", "480 202512 FR 3 53091 79 60086 46096 90 \n", "481 202513 FR 3 43291 65 50549 36033 75 \n", "\n", " inc100_low \n", "0 55 \n", "1 58 \n", "2 102 \n", "3 210 \n", "4 255 \n", "5 290 \n", "6 305 \n", "7 299 \n", "8 318 \n", "9 372 \n", "10 419 \n", "11 411 \n", "12 258 \n", "13 167 \n", "14 88 \n", "15 47 \n", "16 38 \n", "17 21 \n", "18 20 \n", "19 13 \n", "20 13 \n", "21 9 \n", "22 8 \n", "23 8 \n", "24 7 \n", "25 8 \n", "26 6 \n", "27 4 \n", "28 5 \n", "29 4 \n", ".. ... \n", "452 42 \n", "453 65 \n", "454 107 \n", "455 101 \n", "456 110 \n", "457 126 \n", "458 99 \n", "459 76 \n", "460 53 \n", "461 78 \n", "462 78 \n", "463 112 \n", "464 121 \n", "465 174 \n", "466 228 \n", "467 336 \n", "468 268 \n", "469 320 \n", "470 363 \n", "471 356 \n", "472 497 \n", "473 475 \n", "474 385 \n", "475 292 \n", "476 186 \n", "477 112 \n", "478 79 \n", "479 78 \n", "480 69 \n", "481 54 \n", "\n", "[482 rows x 9 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data_local.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nos données utilisent une convention inhabituelle: le numéro de\n", "semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète.\n", " \n", "Un deuxième problème est que Pandas ne comprend pas les numéros de\n", "semaine. Il faut lui fournir les dates de début et de fin de\n", "semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous\n", "écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont\n", "dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation\n", "comme nouvel index de notre jeux de données. Ceci en fait\n", "une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans\n", "le sens chronologique." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous vérifions la cohérence des données. Entre la fin d'une période et\n", "le début de la période qui suit, la différence temporelle doit être\n", "zéro, ou au moins très faible. Nous laissons une \"marge d'erreur\"\n", "d'une seconde.\n", "\n", "Ceci s'avère tout à fait juste sauf pour deux périodes consécutives\n", "entre lesquelles il manque une semaine.\n", "\n", "Nous reconnaissons ces dates: c'est la semaine sans observations\n", "que nous avions supprimées !" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un premier regard sur les données !" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un zoom sur les dernières années montre mieux la situation des pics en hiver. Le creux des incidences se trouve en été." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Etude de l'incidence annuelle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Etant donné que le pic de l'épidémie se situe en hiver, à cheval\n", "entre deux années civiles, nous définissons la période de référence\n", "entre deux minima de l'incidence, du 1er août de l'année $N$ au\n", "1er août de l'année $N+1$.\n", "\n", "Notre tâche est un peu compliquée par le fait que l'année ne comporte\n", "pas un nombre entier de semaines. Nous modifions donc un peu nos périodes\n", "de référence: à la place du 1er août de chaque année, nous utilisons le\n", "premier jour de la semaine qui contient le 1er août.\n", "\n", "Comme l'incidence de syndrome grippal est très faible en été, cette\n", "modification ne risque pas de fausser nos conclusions.\n", "\n", "Encore un petit détail: les données commencent an octobre 1984, ce qui\n", "rend la première année incomplète. Nous commençons donc l'analyse en 1985." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En partant de cette liste des semaines qui contiennent un 1er août, nous obtenons nos intervalles d'environ un an comme les périodes entre deux semaines adjacentes dans cette liste. Nous calculons les sommes des incidences hebdomadaires pour toutes ces périodes.\n", "\n", "Nous vérifions également que ces périodes contiennent entre 51 et 52 semaines, pour nous protéger contre des éventuelles erreurs dans notre code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici les incidences annuelles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Une liste triée permet de plus facilement répérer les valeurs les plus élevées (à la fin)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enfin, un histogramme montre bien que les épidémies fortes, qui touchent environ 10% de la population\n", " française, sont assez rares: il y en eu trois au cours des 35 dernières années." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }