{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence de la varicelle" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données de l'incidence du syndrome de la varicelle sont disponibles du site Web du [Réseau Sentinelles](http://www.sentiweb.fr/). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée. Nous téléchargeons toujours le jeu de données complet, qui commence en 1991 et se termine avec une semaine récente." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour nous protéger contre une éventuelle disparition ou modification du serveur du Réseau Sentinelles, nous faisons une copie locale de ce jeux de données que nous préservons avec notre analyse. Il est inutile et même risquée de télécharger les données à chaque exécution, car dans le cas d'une panne nous pourrions remplacer nos données par un fichier défectueux. Pour cette raison, nous téléchargeons les données seulement si la copie locale n'existe pas." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-7.csv\"\n", "data_file = \"syndrome-varicelle.csv\"\n", "\n", "import os\n", "import urllib.request\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes données [sur le site d'origine](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Nom de colonne | Libellé de colonne |\n", "|----------------|-----------------------------------------------------------------------------------------------------------------------------------|\n", "| week | Semaine calendaire (ISO 8601) |\n", "| indicator | Code de l'indicateur de surveillance |\n", "| inc | Estimation de l'incidence de consultations en nombre de cas |\n", "| inc_low | Estimation de la borne inférieure de l'IC95% du nombre de cas de consultation |\n", "| inc_up | Estimation de la borne supérieure de l'IC95% du nombre de cas de consultation |\n", "| inc100 | Estimation du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_low | Estimation de la borne inférieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_up | Estimation de la borne supérieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| geo_insee | Code de la zone géographique concernée (Code INSEE) http://www.insee.fr/fr/methodes/nomenclatures/cog/ |\n", "| geo_name | Libellé de la zone géographique (ce libellé peut être modifié sans préavis) |\n", "\n", "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant `skiprows=1`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "raw_data = pd.read_csv(data_url, encoding = 'iso-8859-1', skiprows=1)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y a-t-il des points manquants dans ce jeux de données ? A priori non pas de données manquantes mais vérifions-le quand même." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [week, indicator, inc, inc_low, inc_up, inc100, inc100_low, inc100_up, geo_insee, geo_name]\n", "Index: []" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aucune donnée n'est manquante, nous pouvons poursuivre notre analyse sans modifier le fichier." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
0202247766193714952410614FRFrance
12022467304113904692537FRFrance
22022457382717205934639FRFrance
32022447427122316311639FRFrance
420224375863330284249513FRFrance
52022427377019505590639FRFrance
62022417417722196135639FRFrance
720224074883147282947212FRFrance
8202239720413313751306FRFrance
9202238717714193123315FRFrance
10202237717254992951315FRFrance
11202236710691781960213FRFrance
12202235715814002762204FRFrance
13202234722667883744315FRFrance
142022337734001739911026FRFrance
152022327780140861151612618FRFrance
16202231768964170962210614FRFrance
172022307903957701230814919FRFrance
182022297148511006019642221529FRFrance
192022287154711102819914231630FRFrance
202022277211911619826184322440FRFrance
212022267168541280620902251931FRFrance
222022257222461801126481342840FRFrance
232022247224581810526811342741FRFrance
242022237187721487522669282234FRFrance
252022227189161494122891292335FRFrance
262022217203101630724313312537FRFrance
272022207235851900428166362943FRFrance
282022197185931418123005282135FRFrance
292022187178511396321739272133FRFrance
.................................
16391991267176081130423912312042FRFrance
16401991257161691070021638281838FRFrance
16411991247161711007122271281739FRFrance
1642199123711947767116223211329FRFrance
1643199122715452995320951271737FRFrance
1644199121714903897520831261636FRFrance
16451991207190531274225364342345FRFrance
16461991197167391124622232291939FRFrance
16471991187213851388228888382551FRFrance
1648199117713462887718047241632FRFrance
16491991167148571006819646261834FRFrance
1650199115713975978118169251832FRFrance
1651199114712265768416846221430FRFrance
165219911379567604113093171123FRFrance
1653199112710864733114397191325FRFrance
16541991117155741118419964271935FRFrance
16551991107166431137221914292038FRFrance
1656199109713741878018702241533FRFrance
1657199108713289881317765231531FRFrance
1658199107712337807716597221529FRFrance
1659199106710877701314741191226FRFrance
1660199105710442654414340181125FRFrance
16611991047791345631126314820FRFrance
16621991037153871048420290271836FRFrance
16631991027162771104621508292038FRFrance
16641991017155651027120859271836FRFrance
16651990527193751329525455342345FRFrance
16661990517190801380724353342543FRFrance
1667199050711079666015498201228FRFrance
16681990497114302610205FRFrance
\n", "

1669 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202247 7 6619 3714 9524 10 6 \n", "1 202246 7 3041 1390 4692 5 3 \n", "2 202245 7 3827 1720 5934 6 3 \n", "3 202244 7 4271 2231 6311 6 3 \n", "4 202243 7 5863 3302 8424 9 5 \n", "5 202242 7 3770 1950 5590 6 3 \n", "6 202241 7 4177 2219 6135 6 3 \n", "7 202240 7 4883 1472 8294 7 2 \n", "8 202239 7 2041 331 3751 3 0 \n", "9 202238 7 1771 419 3123 3 1 \n", "10 202237 7 1725 499 2951 3 1 \n", "11 202236 7 1069 178 1960 2 1 \n", "12 202235 7 1581 400 2762 2 0 \n", "13 202234 7 2266 788 3744 3 1 \n", "14 202233 7 7340 0 17399 11 0 \n", "15 202232 7 7801 4086 11516 12 6 \n", "16 202231 7 6896 4170 9622 10 6 \n", "17 202230 7 9039 5770 12308 14 9 \n", "18 202229 7 14851 10060 19642 22 15 \n", "19 202228 7 15471 11028 19914 23 16 \n", "20 202227 7 21191 16198 26184 32 24 \n", "21 202226 7 16854 12806 20902 25 19 \n", "22 202225 7 22246 18011 26481 34 28 \n", "23 202224 7 22458 18105 26811 34 27 \n", "24 202223 7 18772 14875 22669 28 22 \n", "25 202222 7 18916 14941 22891 29 23 \n", "26 202221 7 20310 16307 24313 31 25 \n", "27 202220 7 23585 19004 28166 36 29 \n", "28 202219 7 18593 14181 23005 28 21 \n", "29 202218 7 17851 13963 21739 27 21 \n", "... ... ... ... ... ... ... ... \n", "1639 199126 7 17608 11304 23912 31 20 \n", "1640 199125 7 16169 10700 21638 28 18 \n", "1641 199124 7 16171 10071 22271 28 17 \n", "1642 199123 7 11947 7671 16223 21 13 \n", "1643 199122 7 15452 9953 20951 27 17 \n", "1644 199121 7 14903 8975 20831 26 16 \n", "1645 199120 7 19053 12742 25364 34 23 \n", "1646 199119 7 16739 11246 22232 29 19 \n", "1647 199118 7 21385 13882 28888 38 25 \n", "1648 199117 7 13462 8877 18047 24 16 \n", "1649 199116 7 14857 10068 19646 26 18 \n", "1650 199115 7 13975 9781 18169 25 18 \n", "1651 199114 7 12265 7684 16846 22 14 \n", "1652 199113 7 9567 6041 13093 17 11 \n", "1653 199112 7 10864 7331 14397 19 13 \n", "1654 199111 7 15574 11184 19964 27 19 \n", "1655 199110 7 16643 11372 21914 29 20 \n", "1656 199109 7 13741 8780 18702 24 15 \n", "1657 199108 7 13289 8813 17765 23 15 \n", "1658 199107 7 12337 8077 16597 22 15 \n", "1659 199106 7 10877 7013 14741 19 12 \n", "1660 199105 7 10442 6544 14340 18 11 \n", "1661 199104 7 7913 4563 11263 14 8 \n", "1662 199103 7 15387 10484 20290 27 18 \n", "1663 199102 7 16277 11046 21508 29 20 \n", "1664 199101 7 15565 10271 20859 27 18 \n", "1665 199052 7 19375 13295 25455 34 23 \n", "1666 199051 7 19080 13807 24353 34 25 \n", "1667 199050 7 11079 6660 15498 20 12 \n", "1668 199049 7 1143 0 2610 2 0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 14 FR France \n", "1 7 FR France \n", "2 9 FR France \n", "3 9 FR France \n", "4 13 FR France \n", "5 9 FR France \n", "6 9 FR France \n", "7 12 FR France \n", "8 6 FR France \n", "9 5 FR France \n", "10 5 FR France \n", "11 3 FR France \n", "12 4 FR France \n", "13 5 FR France \n", "14 26 FR France \n", "15 18 FR France \n", "16 14 FR France \n", "17 19 FR France \n", "18 29 FR France \n", "19 30 FR France \n", "20 40 FR France \n", "21 31 FR France \n", "22 40 FR France \n", "23 41 FR France \n", "24 34 FR France \n", "25 35 FR France \n", "26 37 FR France \n", "27 43 FR France \n", "28 35 FR France \n", "29 33 FR France \n", "... ... ... ... \n", "1639 42 FR France \n", "1640 38 FR France \n", "1641 39 FR France \n", "1642 29 FR France \n", "1643 37 FR France \n", "1644 36 FR France \n", "1645 45 FR France \n", "1646 39 FR France \n", "1647 51 FR France \n", "1648 32 FR France \n", "1649 34 FR France \n", "1650 32 FR France \n", "1651 30 FR France \n", "1652 23 FR France \n", "1653 25 FR France \n", "1654 35 FR France \n", "1655 38 FR France \n", "1656 33 FR France \n", "1657 31 FR France \n", "1658 29 FR France \n", "1659 26 FR France \n", "1660 25 FR France \n", "1661 20 FR France \n", "1662 36 FR France \n", "1663 38 FR France \n", "1664 36 FR France \n", "1665 45 FR France \n", "1666 43 FR France \n", "1667 28 FR France \n", "1668 5 FR France \n", "\n", "[1669 rows x 10 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nos données utilisent une convention inhabituelle: le numéro de\n", "semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète.\n", " \n", "Un deuxième problème est que Pandas ne comprend pas les numéros de\n", "semaine. Il faut lui fournir les dates de début et de fin de\n", "semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous\n", "écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont\n", "dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation\n", "comme nouvel index de notre jeux de données. Ceci en fait\n", "une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans\n", "le sens chronologique." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Nous vérifions la cohérence des données. Entre la fin d'une période et\n", "le début de la période qui suit, la différence temporelle doit être\n", "zéro, ou au moins très faible. Nous laissons une \"marge d'erreur\"\n", "d'une seconde.Ceci s'avère tout à fait juste sauf pour deux périodes consécutives\n", "entre lesquelles il manque une semaine.Nous reconnaissons ces dates: c'est la semaine sans observations\n", "que nous avions supprimées !" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Etude de l'incidence annuelle\n", "Nous définissons la période de référence du 1er septembre de l'année $N$ au\n", "1er septembre de l'année $N+1$. Notre tâche est un peu compliquée par le fait que l'année ne comporte\n", "pas un nombre entier de semaines. Nous modifions donc un peu nos périodes\n", "de référence: à la place du 1er septembre de chaque année, nous utilisons le\n", "premier jour de la semaine qui contient le 1er septembre.Comme l'incidence de syndrome grippal est très faible en été, cette\n", "modification ne risque pas de fausser nos conclusions.Encore un petit détail: les données commencent en décembre 1990, ce qui\n", "rend la première année incomplète. Nous commençons donc l'analyse en 1991.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }