{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence de la varicelle" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données de l'incidence du syndrome de varicelle sont disponibles du site Web du [Réseau Sentinelles](http://www.sentiweb.fr/). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée. Nous téléchargeons toujours le jeu de données complet, qui commence en 1984 et se termine avec une semaine récente." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data_url = 'https://www.sentiweb.fr/datasets/incidence-PAY-7.csv'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes données [sur le site d'origine](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "```json\n", "{\n", "\t\"profile\": \"tabular-data-resource\",\n", "\t\"name\": \"sentiweb-incidence-{$id}\",\n", "\t\"path\": \"http://www.sentiweb.fr/datasets/{$file}\",\n", "\t\"title\": \"Sentiweb Incidence Data file\",\n", "\t\"description\": \"\",\n", "\t\"format\": \"csv\",\n", "\t\"mediatype\": \"text/csv\",\n", "\t\"encoding\": \"iso-8859-1\",\n", "\t\"schema\": {\n", "\t\t\"fields\": [\n", "\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"week\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"description\": \"ISO8601 Yearweek number as numeric (year*100 + week nubmer)\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"geo_insee\",\n", "\t\t\t\t\"type\": \"string\",\n", "\t\t\t\t\"title\": \"Geographic area\",\n", "\t\t\t\t\"description\": \"Identifier of the geographic area, from INSEE https://www.insee.fr\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"geo_name\",\n", "\t\t\t\t\"type\": \"string\",\n", "\t\t\t\t\"title\": \"Geographic area label\",\n", "\t\t\t\t\"description\": \"Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"indicator\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Indicator id\",\n", "\t\t\t\t\"description\": \"Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"inc\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Estimated incidence\",\n", "\t\t\t\t\"description\": \"Estimated incidence value for the time step, in the geographic level\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"inc_low\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Lower bound of Estimated incidence 95% CI\",\n", "\t\t\t\t\"description\": \"Lower bound of the estimated incidence 95% Confidence Interval\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"inc_up\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Upper bound of Estimated incidence 95% CI\",\n", "\t\t\t\t\"description\": \"Upper bound of the estimated incidence 95% Confidence Interval\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"inc100\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Estimated rate incidence\",\n", "\t\t\t\t\"description\": \"Estimated rate incidence per 100,000 inhabitants\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"inc100_low\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Lower bound of estimated rate incidence 95% CI\",\n", "\t\t\t\t\"description\": \"Lower bound of the estimated incidence 95% Confidence Interval\"\n", "\t\t\t},\n", "\t\t\t{\n", "\t\t\t\t\"name\": \"inc100_up\",\n", "\t\t\t\t\"type\": \"integer\",\n", "\t\t\t\t\"title\": \"Upper bound of rate incidence 95% CI\",\n", "\t\t\t\t\"description\": \"Upper bound of the estimated rate incidence 95% Confidence Interval\"\n", "\t\t\t}\n", "\n", "\t\t],\n", "\t\t\"primaryKey\": [\n", "\n", "\t\t\t\"week\",\n", "\t\t\t\"indicator\",\n", "\t\t\t\"geo_insee\"\n", "\n", "\t\t],\n", "\n", "\t\t\"missingValues\": [\"-\"]\n", "\t},\n", "\t\"dialect\": {\n", "\t\t\"csvddfVersion\": \"1.0\",\n", "\t\t\"delimiter\": \",\",\n", "\t\t\"doubleQuote\": false,\n", "\t\t\"lineTerminator\": \"\\r\\n\",\n", "\t\t\"quoteChar\": \"\\\"\",\n", "\t\t\"skipInitialSpace\": true,\n", "\t\t\"header\": true,\n", "\n", "\t\t\"commentChar\": \"#\"\n", "\t}\n", "}\n", "```\n", "\n", "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant `skiprows=1`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Save source data so that even if URL is not available, we still have a copy:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File 'varicelle_data.csv' does not exist, downloading...\n", "File 'varicelle_data.csv' downloaded on 2020-08-22.\n" ] } ], "source": [ "import os\n", "import urllib.request\n", "from datetime import date\n", "\n", "filename = 'varicelle_data.csv'\n", "\n", "if os.path.isfile(filename):\n", " print(\"File '{}' exists\".format(filename))\n", "else:\n", " print(\"File '{}' does not exist, downloading...\".format(filename))\n", " urllib.request.urlretrieve(data_url, filename)\n", " download_time = date.today()\n", " print(\"File '{}' downloaded on {}.\".format(filename, download_time))\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
0202033788801841102FRFrance
1202032725596244494417FRFrance
2202031713031002506204FRFrance
320203071385752695204FRFrance
42020297841101672102FRFrance
5202028772801515102FRFrance
620202779861491823102FRFrance
7202026769401454102FRFrance
820202572280597001FRFrance
920202473880959102FRFrance
10202023755811115102FRFrance
1120202272770633001FRFrance
122020217602361168102FRFrance
132020207824201628102FRFrance
1420201973100753001FRFrance
152020187849981600102FRFrance
1620201772720658001FRFrance
172020167758781438102FRFrance
18202015719186753161315FRFrance
192020147387922275531639FRFrance
20202013773265236941611814FRFrance
212020127812357901045612816FRFrance
22202011710198756812828151119FRFrance
2320201079011669111331141018FRFrance
242020097136311054416718211626FRFrance
25202008710424770813140161220FRFrance
2620200778959657411344141018FRFrance
2720200679264692511603141018FRFrance
2820200578505631410696131016FRFrance
292020047799158311015112915FRFrance
.................................
15201991267176081130423912312042FRFrance
15211991257161691070021638281838FRFrance
15221991247161711007122271281739FRFrance
1523199123711947767116223211329FRFrance
1524199122715452995320951271737FRFrance
1525199121714903897520831261636FRFrance
15261991207190531274225364342345FRFrance
15271991197167391124622232291939FRFrance
15281991187213851388228888382551FRFrance
1529199117713462887718047241632FRFrance
15301991167148571006819646261834FRFrance
1531199115713975978118169251832FRFrance
1532199114712265768416846221430FRFrance
153319911379567604113093171123FRFrance
1534199112710864733114397191325FRFrance
15351991117155741118419964271935FRFrance
15361991107166431137221914292038FRFrance
1537199109713741878018702241533FRFrance
1538199108713289881317765231531FRFrance
1539199107712337807716597221529FRFrance
1540199106710877701314741191226FRFrance
1541199105710442654414340181125FRFrance
15421991047791345631126314820FRFrance
15431991037153871048420290271836FRFrance
15441991027162771104621508292038FRFrance
15451991017155651027120859271836FRFrance
15461990527193751329525455342345FRFrance
15471990517190801380724353342543FRFrance
1548199050711079666015498201228FRFrance
15491990497114302610205FRFrance
\n", "

1550 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202033 7 888 0 1841 1 0 \n", "1 202032 7 2559 624 4494 4 1 \n", "2 202031 7 1303 100 2506 2 0 \n", "3 202030 7 1385 75 2695 2 0 \n", "4 202029 7 841 10 1672 1 0 \n", "5 202028 7 728 0 1515 1 0 \n", "6 202027 7 986 149 1823 1 0 \n", "7 202026 7 694 0 1454 1 0 \n", "8 202025 7 228 0 597 0 0 \n", "9 202024 7 388 0 959 1 0 \n", "10 202023 7 558 1 1115 1 0 \n", "11 202022 7 277 0 633 0 0 \n", "12 202021 7 602 36 1168 1 0 \n", "13 202020 7 824 20 1628 1 0 \n", "14 202019 7 310 0 753 0 0 \n", "15 202018 7 849 98 1600 1 0 \n", "16 202017 7 272 0 658 0 0 \n", "17 202016 7 758 78 1438 1 0 \n", "18 202015 7 1918 675 3161 3 1 \n", "19 202014 7 3879 2227 5531 6 3 \n", "20 202013 7 7326 5236 9416 11 8 \n", "21 202012 7 8123 5790 10456 12 8 \n", "22 202011 7 10198 7568 12828 15 11 \n", "23 202010 7 9011 6691 11331 14 10 \n", "24 202009 7 13631 10544 16718 21 16 \n", "25 202008 7 10424 7708 13140 16 12 \n", "26 202007 7 8959 6574 11344 14 10 \n", "27 202006 7 9264 6925 11603 14 10 \n", "28 202005 7 8505 6314 10696 13 10 \n", "29 202004 7 7991 5831 10151 12 9 \n", "... ... ... ... ... ... ... ... \n", "1520 199126 7 17608 11304 23912 31 20 \n", "1521 199125 7 16169 10700 21638 28 18 \n", "1522 199124 7 16171 10071 22271 28 17 \n", "1523 199123 7 11947 7671 16223 21 13 \n", "1524 199122 7 15452 9953 20951 27 17 \n", "1525 199121 7 14903 8975 20831 26 16 \n", "1526 199120 7 19053 12742 25364 34 23 \n", "1527 199119 7 16739 11246 22232 29 19 \n", "1528 199118 7 21385 13882 28888 38 25 \n", "1529 199117 7 13462 8877 18047 24 16 \n", "1530 199116 7 14857 10068 19646 26 18 \n", "1531 199115 7 13975 9781 18169 25 18 \n", "1532 199114 7 12265 7684 16846 22 14 \n", "1533 199113 7 9567 6041 13093 17 11 \n", "1534 199112 7 10864 7331 14397 19 13 \n", "1535 199111 7 15574 11184 19964 27 19 \n", "1536 199110 7 16643 11372 21914 29 20 \n", "1537 199109 7 13741 8780 18702 24 15 \n", "1538 199108 7 13289 8813 17765 23 15 \n", "1539 199107 7 12337 8077 16597 22 15 \n", "1540 199106 7 10877 7013 14741 19 12 \n", "1541 199105 7 10442 6544 14340 18 11 \n", "1542 199104 7 7913 4563 11263 14 8 \n", "1543 199103 7 15387 10484 20290 27 18 \n", "1544 199102 7 16277 11046 21508 29 20 \n", "1545 199101 7 15565 10271 20859 27 18 \n", "1546 199052 7 19375 13295 25455 34 23 \n", "1547 199051 7 19080 13807 24353 34 25 \n", "1548 199050 7 11079 6660 15498 20 12 \n", "1549 199049 7 1143 0 2610 2 0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 2 FR France \n", "1 7 FR France \n", "2 4 FR France \n", "3 4 FR France \n", "4 2 FR France \n", "5 2 FR France \n", "6 2 FR France \n", "7 2 FR France \n", "8 1 FR France \n", "9 2 FR France \n", "10 2 FR France \n", "11 1 FR France \n", "12 2 FR France \n", "13 2 FR France \n", "14 1 FR France \n", "15 2 FR France \n", "16 1 FR France \n", "17 2 FR France \n", "18 5 FR France \n", "19 9 FR France \n", "20 14 FR France \n", "21 16 FR France \n", "22 19 FR France \n", "23 18 FR France \n", "24 26 FR France \n", "25 20 FR France \n", "26 18 FR France \n", "27 18 FR France \n", "28 16 FR France \n", "29 15 FR France \n", "... ... ... ... \n", "1520 42 FR France \n", "1521 38 FR France \n", "1522 39 FR France \n", "1523 29 FR France \n", "1524 37 FR France \n", "1525 36 FR France \n", "1526 45 FR France \n", "1527 39 FR France \n", "1528 51 FR France \n", "1529 32 FR France \n", "1530 34 FR France \n", "1531 32 FR France \n", "1532 30 FR France \n", "1533 23 FR France \n", "1534 25 FR France \n", "1535 35 FR France \n", "1536 38 FR France \n", "1537 33 FR France \n", "1538 31 FR France \n", "1539 29 FR France \n", "1540 26 FR France \n", "1541 25 FR France \n", "1542 20 FR France \n", "1543 36 FR France \n", "1544 38 FR France \n", "1545 36 FR France \n", "1546 45 FR France \n", "1547 43 FR France \n", "1548 28 FR France \n", "1549 5 FR France \n", "\n", "[1550 rows x 10 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(filename, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y a-t-il des points manquants dans ce jeux de données ? Non" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "raw_data[raw_data.isnull().any(axis=1)]\n", "data = raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nos données utilisent une convention inhabituelle: le numéro de\n", "semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète.\n", " \n", "Un deuxième problème est que Pandas ne comprend pas les numéros de\n", "semaine. Il faut lui fournir les dates de début et de fin de\n", "semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous\n", "écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont\n", "dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation\n", "comme nouvel index de notre jeux de données. Ceci en fait\n", "une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans\n", "le sens chronologique." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous vérifions la cohérence des données. Entre la fin d'une période et\n", "le début de la période qui suit, la différence temporelle doit être\n", "zéro, ou au moins très faible. Nous laissons une \"marge d'erreur\"\n", "d'une seconde.\n", "\n", "Ceci s'avère tout à fait juste sauf pour deux périodes consécutives\n", "entre lesquelles il manque une semaine.\n", "\n", "Nous reconnaissons ces dates: c'est la semaine sans observations\n", "que nous avions supprimées !" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un zoom sur les dernières années montre mieux la situation des pics en fin d'hiver. Le creux des incidences se trouve en septembre." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Etude de l'incidence annuelle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Etant donné que le pic de l'épidémie se situe en du début de l'automne à la fin du printemps, à cheval\n", "entre deux années civiles, nous définissons la période de référence\n", "entre deux minima de l'incidence, du 1er septembre de l'année $N$ au\n", "1er septembre de l'année $N+1$.\n", "\n", "Notre tâche est un peu compliquée par le fait que l'année ne comporte\n", "pas un nombre entier de semaines. Nous modifions donc un peu nos périodes\n", "de référence: à la place du 1er septembre de chaque année, nous utilisons le\n", "premier jour de la semaine qui contient le 1er septembre.\n", "\n", "Comme l'incidence de syndrome de la varicelle est très faible en été, cette\n", "modification ne risque pas de fausser nos conclusions.\n", "\n", "Encore un petit détail: les données commencent en décembre 1990, ce qui\n", "rend la première année incomplète. Nous commençons donc l'analyse en 1991." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "first_september_week = [pd.Period(pd.Timestamp(y, 9, 1), 'W')\n", " for y in range(1991,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_september_week[:-1],\n", " first_september_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici les incidences annuelles." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Une liste triée permet de plus facilement répérer les valeurs les plus élevées (à la fin)." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2002 516689\n", "2018 542312\n", "2017 551041\n", "1996 564901\n", "2019 584066\n", "2015 604382\n", "2000 617597\n", "2001 619041\n", "2012 624573\n", "2005 628464\n", "2006 632833\n", "2011 642368\n", "1993 643387\n", "1995 652478\n", "1994 661409\n", "1998 677775\n", "1997 683434\n", "2014 685769\n", "2013 698332\n", "2007 717352\n", "2008 749478\n", "1999 756456\n", "2003 758363\n", "2004 777388\n", "2016 782114\n", "2010 829911\n", "1992 832939\n", "2009 842373\n", "dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }