{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence du syndrome de la varicelle" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données de l'incidence du syndrome de la varicelle sont disponibles du site Web du [Réseau Sentinelles](http://www.sentiweb.fr/). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée. Nous téléchargeons toujours le jeu de données complet, qui commence en 1984 et se termine avec une semaine récente." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = 'http://www.sentiweb.fr/datasets/incidence-PAY-7.csv'\n", "data_file = \"syndrome-varicelle.csv\"\n", "\n", "import os\n", "import urllib.request\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dans la cellule précédente, on a vérifié si un fichier local existe.\n", "Si ce n'est pas le cas, alors on le télécharge sur le site de Sentinelle." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes données [sur le site d'origine](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Nom de colonne | Libellé de colonne |\n", "|----------------|-----------------------------------------------------------------------------------------------------------------------------------|\n", "| week | Semaine calendaire (ISO 8601) |\n", "| indicator | Code de l'indicateur de surveillance |\n", "| inc | Estimation de l'incidence de consultations en nombre de cas |\n", "| inc_low | Estimation de la borne inférieure de l'IC95% du nombre de cas de consultation |\n", "| inc_up | Estimation de la borne supérieure de l'IC95% du nombre de cas de consultation |\n", "| inc100 | Estimation du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_low | Estimation de la borne inférieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_up | Estimation de la borne supérieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| geo_insee | Code de la zone géographique concernée (Code INSEE) http://www.insee.fr/fr/methodes/nomenclatures/cog/ |\n", "| geo_name | Libellé de la zone géographique (ce libellé peut être modifié sans préavis) |\n", "\n", "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant `skiprows=1`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
0202151713369943517303201426FRFrance
12021507141281031217944211527FRFrance
22021497136741036916979211626FRFrance
3202148711549850314595171222FRFrance
4202147711419837614462171222FRFrance
52021467821657241070812816FRFrance
620214578965646811462141018FRFrance
72021447873656361183613818FRFrance
82021437814551641112612717FRFrance
92021427944360371284914919FRFrance
102021417402122395803639FRFrance
1120214074441245464287410FRFrance
122021397229110563526315FRFrance
1320213874325226763837410FRFrance
14202137719647543174315FRFrance
152021367344117305152528FRFrance
162021357256211074017426FRFrance
17202134714293782480204FRFrance
182021337382918305828639FRFrance
192021327410818956321639FRFrance
2020213174793230172857311FRFrance
212021307719041911018911616FRFrance
22202129768004109949110614FRFrance
232021287973402173115033FRFrance
242021277902643161373614721FRFrance
252021267728441081046011616FRFrance
2620212579351654012162141018FRFrance
27202124712034893715131181323FRFrance
2820212379116642011812141018FRFrance
2920212274817275268827410FRFrance
.................................
15911991267176081130423912312042FRFrance
15921991257161691070021638281838FRFrance
15931991247161711007122271281739FRFrance
1594199123711947767116223211329FRFrance
1595199122715452995320951271737FRFrance
1596199121714903897520831261636FRFrance
15971991207190531274225364342345FRFrance
15981991197167391124622232291939FRFrance
15991991187213851388228888382551FRFrance
1600199117713462887718047241632FRFrance
16011991167148571006819646261834FRFrance
1602199115713975978118169251832FRFrance
1603199114712265768416846221430FRFrance
160419911379567604113093171123FRFrance
1605199112710864733114397191325FRFrance
16061991117155741118419964271935FRFrance
16071991107166431137221914292038FRFrance
1608199109713741878018702241533FRFrance
1609199108713289881317765231531FRFrance
1610199107712337807716597221529FRFrance
1611199106710877701314741191226FRFrance
1612199105710442654414340181125FRFrance
16131991047791345631126314820FRFrance
16141991037153871048420290271836FRFrance
16151991027162771104621508292038FRFrance
16161991017155651027120859271836FRFrance
16171990527193751329525455342345FRFrance
16181990517190801380724353342543FRFrance
1619199050711079666015498201228FRFrance
16201990497114302610205FRFrance
\n", "

1621 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202151 7 13369 9435 17303 20 14 \n", "1 202150 7 14128 10312 17944 21 15 \n", "2 202149 7 13674 10369 16979 21 16 \n", "3 202148 7 11549 8503 14595 17 12 \n", "4 202147 7 11419 8376 14462 17 12 \n", "5 202146 7 8216 5724 10708 12 8 \n", "6 202145 7 8965 6468 11462 14 10 \n", "7 202144 7 8736 5636 11836 13 8 \n", "8 202143 7 8145 5164 11126 12 7 \n", "9 202142 7 9443 6037 12849 14 9 \n", "10 202141 7 4021 2239 5803 6 3 \n", "11 202140 7 4441 2454 6428 7 4 \n", "12 202139 7 2291 1056 3526 3 1 \n", "13 202138 7 4325 2267 6383 7 4 \n", "14 202137 7 1964 754 3174 3 1 \n", "15 202136 7 3441 1730 5152 5 2 \n", "16 202135 7 2562 1107 4017 4 2 \n", "17 202134 7 1429 378 2480 2 0 \n", "18 202133 7 3829 1830 5828 6 3 \n", "19 202132 7 4108 1895 6321 6 3 \n", "20 202131 7 4793 2301 7285 7 3 \n", "21 202130 7 7190 4191 10189 11 6 \n", "22 202129 7 6800 4109 9491 10 6 \n", "23 202128 7 9734 0 21731 15 0 \n", "24 202127 7 9026 4316 13736 14 7 \n", "25 202126 7 7284 4108 10460 11 6 \n", "26 202125 7 9351 6540 12162 14 10 \n", "27 202124 7 12034 8937 15131 18 13 \n", "28 202123 7 9116 6420 11812 14 10 \n", "29 202122 7 4817 2752 6882 7 4 \n", "... ... ... ... ... ... ... ... \n", "1591 199126 7 17608 11304 23912 31 20 \n", "1592 199125 7 16169 10700 21638 28 18 \n", "1593 199124 7 16171 10071 22271 28 17 \n", "1594 199123 7 11947 7671 16223 21 13 \n", "1595 199122 7 15452 9953 20951 27 17 \n", "1596 199121 7 14903 8975 20831 26 16 \n", "1597 199120 7 19053 12742 25364 34 23 \n", "1598 199119 7 16739 11246 22232 29 19 \n", "1599 199118 7 21385 13882 28888 38 25 \n", "1600 199117 7 13462 8877 18047 24 16 \n", "1601 199116 7 14857 10068 19646 26 18 \n", "1602 199115 7 13975 9781 18169 25 18 \n", "1603 199114 7 12265 7684 16846 22 14 \n", "1604 199113 7 9567 6041 13093 17 11 \n", "1605 199112 7 10864 7331 14397 19 13 \n", "1606 199111 7 15574 11184 19964 27 19 \n", "1607 199110 7 16643 11372 21914 29 20 \n", "1608 199109 7 13741 8780 18702 24 15 \n", "1609 199108 7 13289 8813 17765 23 15 \n", "1610 199107 7 12337 8077 16597 22 15 \n", "1611 199106 7 10877 7013 14741 19 12 \n", "1612 199105 7 10442 6544 14340 18 11 \n", "1613 199104 7 7913 4563 11263 14 8 \n", "1614 199103 7 15387 10484 20290 27 18 \n", "1615 199102 7 16277 11046 21508 29 20 \n", "1616 199101 7 15565 10271 20859 27 18 \n", "1617 199052 7 19375 13295 25455 34 23 \n", "1618 199051 7 19080 13807 24353 34 25 \n", "1619 199050 7 11079 6660 15498 20 12 \n", "1620 199049 7 1143 0 2610 2 0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 26 FR France \n", "1 27 FR France \n", "2 26 FR France \n", "3 22 FR France \n", "4 22 FR France \n", "5 16 FR France \n", "6 18 FR France \n", "7 18 FR France \n", "8 17 FR France \n", "9 19 FR France \n", "10 9 FR France \n", "11 10 FR France \n", "12 5 FR France \n", "13 10 FR France \n", "14 5 FR France \n", "15 8 FR France \n", "16 6 FR France \n", "17 4 FR France \n", "18 9 FR France \n", "19 9 FR France \n", "20 11 FR France \n", "21 16 FR France \n", "22 14 FR France \n", "23 33 FR France \n", "24 21 FR France \n", "25 16 FR France \n", "26 18 FR France \n", "27 23 FR France \n", "28 18 FR France \n", "29 10 FR France \n", "... ... ... ... \n", "1591 42 FR France \n", "1592 38 FR France \n", "1593 39 FR France \n", "1594 29 FR France \n", "1595 37 FR France \n", "1596 36 FR France \n", "1597 45 FR France \n", "1598 39 FR France \n", "1599 51 FR France \n", "1600 32 FR France \n", "1601 34 FR France \n", "1602 32 FR France \n", "1603 30 FR France \n", "1604 23 FR France \n", "1605 25 FR France \n", "1606 35 FR France \n", "1607 38 FR France \n", "1608 33 FR France \n", "1609 31 FR France \n", "1610 29 FR France \n", "1611 26 FR France \n", "1612 25 FR France \n", "1613 20 FR France \n", "1614 36 FR France \n", "1615 38 FR France \n", "1616 36 FR France \n", "1617 45 FR France \n", "1618 43 FR France \n", "1619 28 FR France \n", "1620 5 FR France \n", "\n", "[1621 rows x 10 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y a-t-il des points manquants dans ce jeux de données ? Oui, la semaine 19 de l'année 1989 n'a pas de valeurs associées." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [week, indicator, inc, inc_low, inc_up, inc100, inc100_low, inc100_up, geo_insee, geo_name]\n", "Index: []" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Non, donc pas de ligne à éliminer." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "data = raw_data.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Par flemme de modifier la suite du code, on défini data comme une copie de raw_data (comme ça c'est pas la même zone de mémoire). \n", "\n", "\n", "Nos données utilisent une convention inhabituelle: le numéro de\n", "semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète.\n", " \n", "Un deuxième problème est que Pandas ne comprend pas les numéros de\n", "semaine. Il faut lui fournir les dates de début et de fin de\n", "semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous\n", "écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont\n", "dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation\n", "comme nouvel index de notre jeux de données. Ceci en fait\n", "une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans\n", "le sens chronologique." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous vérifions la cohérence des données. Entre la fin d'une période et\n", "le début de la période qui suit, la différence temporelle doit être\n", "zéro, ou au moins très faible. Nous laissons une \"marge d'erreur\"\n", "d'une seconde.\n", "\n", "Ceci s'avère tout à fait juste sauf pour deux périodes consécutives\n", "entre lesquelles il manque une semaine.\n", "\n", "Nous reconnaissons ces dates: c'est la semaine sans observations\n", "que nous avions supprimées !" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un premier regard sur les données !" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un zoom sur les dernières années montre mieux la situation des pics en hiver. Le creux des incidences se trouve en été." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sorted_data['inc'][-300:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Etude de l'incidence annuelle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Etant donné que le pic de l'épidémie se situe en hiver, à cheval\n", "entre deux années civiles, nous définissons la période de référence\n", "entre deux minima de l'incidence, du 1er août de l'année $N$ au\n", "1er août de l'année $N+1$.\n", "\n", "Notre tâche est un peu compliquée par le fait que l'année ne comporte\n", "pas un nombre entier de semaines. Nous modifions donc un peu nos périodes\n", "de référence: à la place du 1er août de chaque année, nous utilisons le\n", "premier jour de la semaine qui contient le 1er août.\n", "\n", "Comme l'incidence de syndrome grippal est très faible en été, cette\n", "modification ne risque pas de fausser nos conclusions.\n", "\n", "Encore un petit détail: les données commencent an octobre 1984, ce qui\n", "rend la première année incomplète. Nous commençons donc l'analyse en 1985." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "first_september_week = [pd.Period(pd.Timestamp(y, 9, 1), 'W')\n", " for y in range(1991,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En partant de cette liste des semaines qui contiennent un 1er septembre, nous obtenons nos intervalles d'environ un an comme les périodes entre deux semaines adjacentes dans cette liste. Nous calculons les sommes des incidences hebdomadaires pour toutes ces périodes.\n", "\n", "On commence en 1991 car c'est la première année complète.\n", "\n", "Nous vérifions également que ces périodes contiennent entre 51 et 52 semaines, pour nous protéger contre des éventuelles erreurs dans notre code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_september_week[:-1],\n", " first_september_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici les incidences annuelles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Une liste triée permet de plus facilement répérer les valeurs les plus élevées (à la fin)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enfin, un histogramme montre bien que les épidémies fortes, qui touchent environ 10% de la population\n", " française, sont assez rares: il y en eu trois au cours des 35 dernières années." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }