{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence du syndrome grippal" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek\n", "import os\n", "import urllib.request\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données de l'incidence du syndrome grippal sont disponibles du site Web du [Réseau Sentinelles](http://www.sentiweb.fr/). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée. Nous téléchargeons toujours le jeu de données complet, qui commence en 1984 et se termine avec une semaine récente." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-3.csv\"\n", "data_file=\"incidence_gripale.csv\"\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un nom de fichier est attribué aux données que nous voulons étudier. \n", "Si ce fichier n'est pas disponible dans le répertoire courant, nous le téléchargeons directement depuis le site source afin d'en avoir une copie locale. Ainsi, le fichier sera toujours disponible pour étude et l'origine est conservée dans le document." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes données [sur le site d'origine](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Nom de colonne | Libellé de colonne |\n", "|----------------|-----------------------------------------------------------------------------------------------------------------------------------|\n", "| week | Semaine calendaire (ISO 8601) |\n", "| indicator | Code de l'indicateur de surveillance |\n", "| inc | Estimation de l'incidence de consultations en nombre de cas |\n", "| inc_low | Estimation de la borne inférieure de l'IC95% du nombre de cas de consultation |\n", "| inc_up | Estimation de la borne supérieure de l'IC95% du nombre de cas de consultation |\n", "| inc100 | Estimation du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_low | Estimation de la borne inférieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| inc100_up | Estimation de la borne supérieure de l'IC95% du taux d'incidence du nombre de cas de consultation (en cas pour 100,000 habitants) |\n", "| geo_insee | Code de la zone géographique concernée (Code INSEE) http://www.insee.fr/fr/methodes/nomenclatures/cog/ |\n", "| geo_name | Libellé de la zone géographique (ce libellé peut être modifié sans préavis) |\n", "\n", "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant `skiprows=1`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
0202016FR25417266344257391956760
1202017FR25459037051774400327961
2202018FR25346185339737294996045
3202019FR25273474231814228804835
4202020FR25284164332957238755036
5202021FR25146512217949113532717
6202022FR25303974632550282444943
7202023FR25216923323520198643630
8202024FR25207833222565190013429
9202025FR25153522316904138002621
10202026FR2510251161152289801714
11202027FR258978141017977771512
12202028FR2510079151137587831713
13202029FR2510061151139787251713
14202030FR25132782014843117132318
15202031FR25120181813527105092116
16202032FR25142072215941124732419
17202033FR25142192216007124312419
18202034FR25143572216049126652419
19202035FR25156942417305140832621
20202036FR25171432618779155072924
21202037FR25331365035395308775447
22202038FR25508587753662480548173
23202039FR25495667552295468377971
24202040FR25570328759978540869182
25202041FR25718761097516768585114104
26202042FR25819481248542978467130119
27202043FR25824201258602178819131120
28202044FR25825441258617978909131120
29202045FR25799411218340376479127116
..............................
254202509FR25124057185135588112526202168
255202510FR259594414310523286656157129
256202511FR259820214710777688628161132
257202512FR259426414110374184787155126
258202513FR2573745110819916549912298
259202514FR2574759112837866573212598
260202515FR25795301198864170419132105
261202516FR256534497744225626611184
262202517FR25488197356035416038462
263202518FR25403966046566342266951
264202519FR25390195845001330376749
265202520FR25531127960442457829068
266202521FR255862387667475049910075
267202522FR25458756852839389117958
268202523FR256038690684015237110278
269202524FR25549078261896479189271
270202525FR25511617658159441638766
271202526FR25483647254996417328262
272202527FR25457526852041394637859
273202528FR25465836953778393888059
274202529FR25413326248532341327251
275202530FR25452106752705377157956
276202531FR25391555847757305537146
277202532FR25471517055928383748357
278202533FR25369355543957299136645
279202534FR25445886752167370097855
280202535FR25462346953668388008058
281202536FR25511957658092442988766
282202537FR25889061339815479658146119
283202538FR25128661192141022116300210174
\n", "

284 rows × 9 columns

\n", "
" ], "text/plain": [ " week geo_insee indicator inc inc100 inc_up inc_low inc100_up \\\n", "0 202016 FR 25 41726 63 44257 39195 67 \n", "1 202017 FR 25 45903 70 51774 40032 79 \n", "2 202018 FR 25 34618 53 39737 29499 60 \n", "3 202019 FR 25 27347 42 31814 22880 48 \n", "4 202020 FR 25 28416 43 32957 23875 50 \n", "5 202021 FR 25 14651 22 17949 11353 27 \n", "6 202022 FR 25 30397 46 32550 28244 49 \n", "7 202023 FR 25 21692 33 23520 19864 36 \n", "8 202024 FR 25 20783 32 22565 19001 34 \n", "9 202025 FR 25 15352 23 16904 13800 26 \n", "10 202026 FR 25 10251 16 11522 8980 17 \n", "11 202027 FR 25 8978 14 10179 7777 15 \n", "12 202028 FR 25 10079 15 11375 8783 17 \n", "13 202029 FR 25 10061 15 11397 8725 17 \n", "14 202030 FR 25 13278 20 14843 11713 23 \n", "15 202031 FR 25 12018 18 13527 10509 21 \n", "16 202032 FR 25 14207 22 15941 12473 24 \n", "17 202033 FR 25 14219 22 16007 12431 24 \n", "18 202034 FR 25 14357 22 16049 12665 24 \n", "19 202035 FR 25 15694 24 17305 14083 26 \n", "20 202036 FR 25 17143 26 18779 15507 29 \n", "21 202037 FR 25 33136 50 35395 30877 54 \n", "22 202038 FR 25 50858 77 53662 48054 81 \n", "23 202039 FR 25 49566 75 52295 46837 79 \n", "24 202040 FR 25 57032 87 59978 54086 91 \n", "25 202041 FR 25 71876 109 75167 68585 114 \n", "26 202042 FR 25 81948 124 85429 78467 130 \n", "27 202043 FR 25 82420 125 86021 78819 131 \n", "28 202044 FR 25 82544 125 86179 78909 131 \n", "29 202045 FR 25 79941 121 83403 76479 127 \n", ".. ... ... ... ... ... ... ... ... \n", "254 202509 FR 25 124057 185 135588 112526 202 \n", "255 202510 FR 25 95944 143 105232 86656 157 \n", "256 202511 FR 25 98202 147 107776 88628 161 \n", "257 202512 FR 25 94264 141 103741 84787 155 \n", "258 202513 FR 25 73745 110 81991 65499 122 \n", "259 202514 FR 25 74759 112 83786 65732 125 \n", "260 202515 FR 25 79530 119 88641 70419 132 \n", "261 202516 FR 25 65344 97 74422 56266 111 \n", "262 202517 FR 25 48819 73 56035 41603 84 \n", "263 202518 FR 25 40396 60 46566 34226 69 \n", "264 202519 FR 25 39019 58 45001 33037 67 \n", "265 202520 FR 25 53112 79 60442 45782 90 \n", "266 202521 FR 25 58623 87 66747 50499 100 \n", "267 202522 FR 25 45875 68 52839 38911 79 \n", "268 202523 FR 25 60386 90 68401 52371 102 \n", "269 202524 FR 25 54907 82 61896 47918 92 \n", "270 202525 FR 25 51161 76 58159 44163 87 \n", "271 202526 FR 25 48364 72 54996 41732 82 \n", "272 202527 FR 25 45752 68 52041 39463 78 \n", "273 202528 FR 25 46583 69 53778 39388 80 \n", "274 202529 FR 25 41332 62 48532 34132 72 \n", "275 202530 FR 25 45210 67 52705 37715 79 \n", "276 202531 FR 25 39155 58 47757 30553 71 \n", "277 202532 FR 25 47151 70 55928 38374 83 \n", "278 202533 FR 25 36935 55 43957 29913 66 \n", "279 202534 FR 25 44588 67 52167 37009 78 \n", "280 202535 FR 25 46234 69 53668 38800 80 \n", "281 202536 FR 25 51195 76 58092 44298 87 \n", "282 202537 FR 25 88906 133 98154 79658 146 \n", "283 202538 FR 25 128661 192 141022 116300 210 \n", "\n", " inc100_low \n", "0 60 \n", "1 61 \n", "2 45 \n", "3 35 \n", "4 36 \n", "5 17 \n", "6 43 \n", "7 30 \n", "8 29 \n", "9 21 \n", "10 14 \n", "11 12 \n", "12 13 \n", "13 13 \n", "14 18 \n", "15 16 \n", "16 19 \n", "17 19 \n", "18 19 \n", "19 21 \n", "20 24 \n", "21 47 \n", "22 73 \n", "23 71 \n", "24 82 \n", "25 104 \n", "26 119 \n", "27 120 \n", "28 120 \n", "29 116 \n", ".. ... \n", "254 168 \n", "255 129 \n", "256 132 \n", "257 126 \n", "258 98 \n", "259 98 \n", "260 105 \n", "261 84 \n", "262 62 \n", "263 51 \n", "264 49 \n", "265 68 \n", "266 75 \n", "267 58 \n", "268 78 \n", "269 71 \n", "270 66 \n", "271 62 \n", "272 59 \n", "273 59 \n", "274 51 \n", "275 56 \n", "276 46 \n", "277 57 \n", "278 45 \n", "279 55 \n", "280 58 \n", "281 66 \n", "282 119 \n", "283 174 \n", "\n", "[284 rows x 9 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y a-t-il des points manquants dans ce jeux de données ? Oui, la semaine 19 de l'année 1989 n'a pas de valeurs associées." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [week, geo_insee, indicator, inc, inc100, inc_up, inc_low, inc100_up, inc100_low]\n", "Index: []" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous éliminons ce point, ce qui n'a pas d'impact fort sur notre analyse qui est assez simple." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekgeo_inseeindicatorincinc100inc_upinc_lowinc100_upinc100_low
0202016FR25417266344257391956760
1202017FR25459037051774400327961
2202018FR25346185339737294996045
3202019FR25273474231814228804835
4202020FR25284164332957238755036
5202021FR25146512217949113532717
6202022FR25303974632550282444943
7202023FR25216923323520198643630
8202024FR25207833222565190013429
9202025FR25153522316904138002621
10202026FR2510251161152289801714
11202027FR258978141017977771512
12202028FR2510079151137587831713
13202029FR2510061151139787251713
14202030FR25132782014843117132318
15202031FR25120181813527105092116
16202032FR25142072215941124732419
17202033FR25142192216007124312419
18202034FR25143572216049126652419
19202035FR25156942417305140832621
20202036FR25171432618779155072924
21202037FR25331365035395308775447
22202038FR25508587753662480548173
23202039FR25495667552295468377971
24202040FR25570328759978540869182
25202041FR25718761097516768585114104
26202042FR25819481248542978467130119
27202043FR25824201258602178819131120
28202044FR25825441258617978909131120
29202045FR25799411218340376479127116
..............................
254202509FR25124057185135588112526202168
255202510FR259594414310523286656157129
256202511FR259820214710777688628161132
257202512FR259426414110374184787155126
258202513FR2573745110819916549912298
259202514FR2574759112837866573212598
260202515FR25795301198864170419132105
261202516FR256534497744225626611184
262202517FR25488197356035416038462
263202518FR25403966046566342266951
264202519FR25390195845001330376749
265202520FR25531127960442457829068
266202521FR255862387667475049910075
267202522FR25458756852839389117958
268202523FR256038690684015237110278
269202524FR25549078261896479189271
270202525FR25511617658159441638766
271202526FR25483647254996417328262
272202527FR25457526852041394637859
273202528FR25465836953778393888059
274202529FR25413326248532341327251
275202530FR25452106752705377157956
276202531FR25391555847757305537146
277202532FR25471517055928383748357
278202533FR25369355543957299136645
279202534FR25445886752167370097855
280202535FR25462346953668388008058
281202536FR25511957658092442988766
282202537FR25889061339815479658146119
283202538FR25128661192141022116300210174
\n", "

284 rows × 9 columns

\n", "
" ], "text/plain": [ " week geo_insee indicator inc inc100 inc_up inc_low inc100_up \\\n", "0 202016 FR 25 41726 63 44257 39195 67 \n", "1 202017 FR 25 45903 70 51774 40032 79 \n", "2 202018 FR 25 34618 53 39737 29499 60 \n", "3 202019 FR 25 27347 42 31814 22880 48 \n", "4 202020 FR 25 28416 43 32957 23875 50 \n", "5 202021 FR 25 14651 22 17949 11353 27 \n", "6 202022 FR 25 30397 46 32550 28244 49 \n", "7 202023 FR 25 21692 33 23520 19864 36 \n", "8 202024 FR 25 20783 32 22565 19001 34 \n", "9 202025 FR 25 15352 23 16904 13800 26 \n", "10 202026 FR 25 10251 16 11522 8980 17 \n", "11 202027 FR 25 8978 14 10179 7777 15 \n", "12 202028 FR 25 10079 15 11375 8783 17 \n", "13 202029 FR 25 10061 15 11397 8725 17 \n", "14 202030 FR 25 13278 20 14843 11713 23 \n", "15 202031 FR 25 12018 18 13527 10509 21 \n", "16 202032 FR 25 14207 22 15941 12473 24 \n", "17 202033 FR 25 14219 22 16007 12431 24 \n", "18 202034 FR 25 14357 22 16049 12665 24 \n", "19 202035 FR 25 15694 24 17305 14083 26 \n", "20 202036 FR 25 17143 26 18779 15507 29 \n", "21 202037 FR 25 33136 50 35395 30877 54 \n", "22 202038 FR 25 50858 77 53662 48054 81 \n", "23 202039 FR 25 49566 75 52295 46837 79 \n", "24 202040 FR 25 57032 87 59978 54086 91 \n", "25 202041 FR 25 71876 109 75167 68585 114 \n", "26 202042 FR 25 81948 124 85429 78467 130 \n", "27 202043 FR 25 82420 125 86021 78819 131 \n", "28 202044 FR 25 82544 125 86179 78909 131 \n", "29 202045 FR 25 79941 121 83403 76479 127 \n", ".. ... ... ... ... ... ... ... ... \n", "254 202509 FR 25 124057 185 135588 112526 202 \n", "255 202510 FR 25 95944 143 105232 86656 157 \n", "256 202511 FR 25 98202 147 107776 88628 161 \n", "257 202512 FR 25 94264 141 103741 84787 155 \n", "258 202513 FR 25 73745 110 81991 65499 122 \n", "259 202514 FR 25 74759 112 83786 65732 125 \n", "260 202515 FR 25 79530 119 88641 70419 132 \n", "261 202516 FR 25 65344 97 74422 56266 111 \n", "262 202517 FR 25 48819 73 56035 41603 84 \n", "263 202518 FR 25 40396 60 46566 34226 69 \n", "264 202519 FR 25 39019 58 45001 33037 67 \n", "265 202520 FR 25 53112 79 60442 45782 90 \n", "266 202521 FR 25 58623 87 66747 50499 100 \n", "267 202522 FR 25 45875 68 52839 38911 79 \n", "268 202523 FR 25 60386 90 68401 52371 102 \n", "269 202524 FR 25 54907 82 61896 47918 92 \n", "270 202525 FR 25 51161 76 58159 44163 87 \n", "271 202526 FR 25 48364 72 54996 41732 82 \n", "272 202527 FR 25 45752 68 52041 39463 78 \n", "273 202528 FR 25 46583 69 53778 39388 80 \n", "274 202529 FR 25 41332 62 48532 34132 72 \n", "275 202530 FR 25 45210 67 52705 37715 79 \n", "276 202531 FR 25 39155 58 47757 30553 71 \n", "277 202532 FR 25 47151 70 55928 38374 83 \n", "278 202533 FR 25 36935 55 43957 29913 66 \n", "279 202534 FR 25 44588 67 52167 37009 78 \n", "280 202535 FR 25 46234 69 53668 38800 80 \n", "281 202536 FR 25 51195 76 58092 44298 87 \n", "282 202537 FR 25 88906 133 98154 79658 146 \n", "283 202538 FR 25 128661 192 141022 116300 210 \n", "\n", " inc100_low \n", "0 60 \n", "1 61 \n", "2 45 \n", "3 35 \n", "4 36 \n", "5 17 \n", "6 43 \n", "7 30 \n", "8 29 \n", "9 21 \n", "10 14 \n", "11 12 \n", "12 13 \n", "13 13 \n", "14 18 \n", "15 16 \n", "16 19 \n", "17 19 \n", "18 19 \n", "19 21 \n", "20 24 \n", "21 47 \n", "22 73 \n", "23 71 \n", "24 82 \n", "25 104 \n", "26 119 \n", "27 120 \n", "28 120 \n", "29 116 \n", ".. ... \n", "254 168 \n", "255 129 \n", "256 132 \n", "257 126 \n", "258 98 \n", "259 98 \n", "260 105 \n", "261 84 \n", "262 62 \n", "263 51 \n", "264 49 \n", "265 68 \n", "266 75 \n", "267 58 \n", "268 78 \n", "269 71 \n", "270 66 \n", "271 62 \n", "272 59 \n", "273 59 \n", "274 51 \n", "275 56 \n", "276 46 \n", "277 57 \n", "278 45 \n", "279 55 \n", "280 58 \n", "281 66 \n", "282 119 \n", "283 174 \n", "\n", "[284 rows x 9 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data.dropna().copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nos données utilisent une convention inhabituelle: le numéro de\n", "semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète.\n", " \n", "Un deuxième problème est que Pandas ne comprend pas les numéros de\n", "semaine. Il faut lui fournir les dates de début et de fin de\n", "semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous\n", "écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont\n", "dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation\n", "comme nouvel index de notre jeux de données. Ceci en fait\n", "une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans\n", "le sens chronologique." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous vérifions la cohérence des données. Entre la fin d'une période et\n", "le début de la période qui suit, la différence temporelle doit être\n", "zéro, ou au moins très faible. Nous laissons une \"marge d'erreur\"\n", "d'une seconde.\n", "\n", "Ceci s'avère tout à fait juste sauf pour deux périodes consécutives\n", "entre lesquelles il manque une semaine.\n", "\n", "Nous reconnaissons ces dates: c'est la semaine sans observations\n", "que nous avions supprimées !" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un premier regard sur les données !" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un zoom sur les dernières années montre mieux la situation des pics en hiver. Le creux des incidences se trouve en été." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Etude de l'incidence annuelle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Etant donné que le pic de l'épidémie se situe en hiver, à cheval\n", "entre deux années civiles, nous définissons la période de référence\n", "entre deux minima de l'incidence, du 1er août de l'année $N$ au\n", "1er août de l'année $N+1$.\n", "\n", "Notre tâche est un peu compliquée par le fait que l'année ne comporte\n", "pas un nombre entier de semaines. Nous modifions donc un peu nos périodes\n", "de référence: à la place du 1er août de chaque année, nous utilisons le\n", "premier jour de la semaine qui contient le 1er août.\n", "\n", "Comme l'incidence de syndrome grippal est très faible en été, cette\n", "modification ne risque pas de fausser nos conclusions.\n", "\n", "Encore un petit détail: les données commencent an octobre 1984, ce qui\n", "rend la première année incomplète. Nous commençons donc l'analyse en 1985." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1985,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En partant de cette liste des semaines qui contiennent un 1er août, nous obtenons nos intervalles d'environ un an comme les périodes entre deux semaines adjacentes dans cette liste. Nous calculons les sommes des incidences hebdomadaires pour toutes ces périodes.\n", "\n", "Nous vérifions également que ces périodes contiennent entre 51 et 52 semaines, pour nous protéger contre des éventuelles erreurs dans notre code." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m first_august_week[1:]):\n\u001b[1;32m 5\u001b[0m \u001b[0mone_year\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msorted_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'inc'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mweek1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mweek2\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mabs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mone_year\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m52\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0myearly_incidence\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mone_year\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0myear\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mweek2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myear\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: " ] } ], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici les incidences annuelles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Une liste triée permet de plus facilement répérer les valeurs les plus élevées (à la fin)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enfin, un histogramme montre bien que les épidémies fortes, qui touchent environ 10% de la population\n", " française, sont assez rares: il y en eu trois au cours des 35 dernières années." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yearly_incidence.hist(xrot=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }