{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence de la varicelle" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-7.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant $skiprow=1$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
020201971590490001FRFrance
12020187824851563102FRFrance
220201772720658001FRFrance
32020167758781438102FRFrance
4202015719186753161315FRFrance
52020147387922275531639FRFrance
6202013773265236941611814FRFrance
72020127812357901045612816FRFrance
8202011710198756812828151119FRFrance
920201079011669111331141018FRFrance
102020097136311054416718211626FRFrance
11202008710424770813140161220FRFrance
1220200778959657411344141018FRFrance
1320200679264692511603141018FRFrance
1420200578505631410696131016FRFrance
152020047799158311015112915FRFrance
1620200375968410078369612FRFrance
17202002765344530853810713FRFrance
1820200179835701912651151119FRFrance
192019527794152461063612816FRFrance
2020195175823367579719612FRFrance
21201950764244276857210713FRFrance
22201949766214540870210713FRFrance
2320194875542338377018511FRFrance
242019477753650581001411715FRFrance
252019467263813163960426FRFrance
2620194574492261563697410FRFrance
2720194475728362778299612FRFrance
2820194374834275169177410FRFrance
29201942762793989856910713FRFrance
.................................
15061991267176081130423912312042FRFrance
15071991257161691070021638281838FRFrance
15081991247161711007122271281739FRFrance
1509199123711947767116223211329FRFrance
1510199122715452995320951271737FRFrance
1511199121714903897520831261636FRFrance
15121991207190531274225364342345FRFrance
15131991197167391124622232291939FRFrance
15141991187213851388228888382551FRFrance
1515199117713462887718047241632FRFrance
15161991167148571006819646261834FRFrance
1517199115713975978118169251832FRFrance
1518199114712265768416846221430FRFrance
151919911379567604113093171123FRFrance
1520199112710864733114397191325FRFrance
15211991117155741118419964271935FRFrance
15221991107166431137221914292038FRFrance
1523199109713741878018702241533FRFrance
1524199108713289881317765231531FRFrance
1525199107712337807716597221529FRFrance
1526199106710877701314741191226FRFrance
1527199105710442654414340181125FRFrance
15281991047791345631126314820FRFrance
15291991037153871048420290271836FRFrance
15301991027162771104621508292038FRFrance
15311991017155651027120859271836FRFrance
15321990527193751329525455342345FRFrance
15331990517190801380724353342543FRFrance
1534199050711079666015498201228FRFrance
15351990497114302610205FRFrance
\n", "

1536 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202019 7 159 0 490 0 0 \n", "1 202018 7 824 85 1563 1 0 \n", "2 202017 7 272 0 658 0 0 \n", "3 202016 7 758 78 1438 1 0 \n", "4 202015 7 1918 675 3161 3 1 \n", "5 202014 7 3879 2227 5531 6 3 \n", "6 202013 7 7326 5236 9416 11 8 \n", "7 202012 7 8123 5790 10456 12 8 \n", "8 202011 7 10198 7568 12828 15 11 \n", "9 202010 7 9011 6691 11331 14 10 \n", "10 202009 7 13631 10544 16718 21 16 \n", "11 202008 7 10424 7708 13140 16 12 \n", "12 202007 7 8959 6574 11344 14 10 \n", "13 202006 7 9264 6925 11603 14 10 \n", "14 202005 7 8505 6314 10696 13 10 \n", "15 202004 7 7991 5831 10151 12 9 \n", "16 202003 7 5968 4100 7836 9 6 \n", "17 202002 7 6534 4530 8538 10 7 \n", "18 202001 7 9835 7019 12651 15 11 \n", "19 201952 7 7941 5246 10636 12 8 \n", "20 201951 7 5823 3675 7971 9 6 \n", "21 201950 7 6424 4276 8572 10 7 \n", "22 201949 7 6621 4540 8702 10 7 \n", "23 201948 7 5542 3383 7701 8 5 \n", "24 201947 7 7536 5058 10014 11 7 \n", "25 201946 7 2638 1316 3960 4 2 \n", "26 201945 7 4492 2615 6369 7 4 \n", "27 201944 7 5728 3627 7829 9 6 \n", "28 201943 7 4834 2751 6917 7 4 \n", "29 201942 7 6279 3989 8569 10 7 \n", "... ... ... ... ... ... ... ... \n", "1506 199126 7 17608 11304 23912 31 20 \n", "1507 199125 7 16169 10700 21638 28 18 \n", "1508 199124 7 16171 10071 22271 28 17 \n", "1509 199123 7 11947 7671 16223 21 13 \n", "1510 199122 7 15452 9953 20951 27 17 \n", "1511 199121 7 14903 8975 20831 26 16 \n", "1512 199120 7 19053 12742 25364 34 23 \n", "1513 199119 7 16739 11246 22232 29 19 \n", "1514 199118 7 21385 13882 28888 38 25 \n", "1515 199117 7 13462 8877 18047 24 16 \n", "1516 199116 7 14857 10068 19646 26 18 \n", "1517 199115 7 13975 9781 18169 25 18 \n", "1518 199114 7 12265 7684 16846 22 14 \n", "1519 199113 7 9567 6041 13093 17 11 \n", "1520 199112 7 10864 7331 14397 19 13 \n", "1521 199111 7 15574 11184 19964 27 19 \n", "1522 199110 7 16643 11372 21914 29 20 \n", "1523 199109 7 13741 8780 18702 24 15 \n", "1524 199108 7 13289 8813 17765 23 15 \n", "1525 199107 7 12337 8077 16597 22 15 \n", "1526 199106 7 10877 7013 14741 19 12 \n", "1527 199105 7 10442 6544 14340 18 11 \n", "1528 199104 7 7913 4563 11263 14 8 \n", "1529 199103 7 15387 10484 20290 27 18 \n", "1530 199102 7 16277 11046 21508 29 20 \n", "1531 199101 7 15565 10271 20859 27 18 \n", "1532 199052 7 19375 13295 25455 34 23 \n", "1533 199051 7 19080 13807 24353 34 25 \n", "1534 199050 7 11079 6660 15498 20 12 \n", "1535 199049 7 1143 0 2610 2 0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 1 FR France \n", "1 2 FR France \n", "2 1 FR France \n", "3 2 FR France \n", "4 5 FR France \n", "5 9 FR France \n", "6 14 FR France \n", "7 16 FR France \n", "8 19 FR France \n", "9 18 FR France \n", "10 26 FR France \n", "11 20 FR France \n", "12 18 FR France \n", "13 18 FR France \n", "14 16 FR France \n", "15 15 FR France \n", "16 12 FR France \n", "17 13 FR France \n", "18 19 FR France \n", "19 16 FR France \n", "20 12 FR France \n", "21 13 FR France \n", "22 13 FR France \n", "23 11 FR France \n", "24 15 FR France \n", "25 6 FR France \n", "26 10 FR France \n", "27 12 FR France \n", "28 10 FR France \n", "29 13 FR France \n", "... ... ... ... \n", "1506 42 FR France \n", "1507 38 FR France \n", "1508 39 FR France \n", "1509 29 FR France \n", "1510 37 FR France \n", "1511 36 FR France \n", "1512 45 FR France \n", "1513 39 FR France \n", "1514 51 FR France \n", "1515 32 FR France \n", "1516 34 FR France \n", "1517 32 FR France \n", "1518 30 FR France \n", "1519 23 FR France \n", "1520 25 FR France \n", "1521 35 FR France \n", "1522 38 FR France \n", "1523 33 FR France \n", "1524 31 FR France \n", "1525 29 FR France \n", "1526 26 FR France \n", "1527 25 FR France \n", "1528 20 FR France \n", "1529 36 FR France \n", "1530 38 FR France \n", "1531 36 FR France \n", "1532 45 FR France \n", "1533 43 FR France \n", "1534 28 FR France \n", "1535 5 FR France \n", "\n", "[1536 rows x 10 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Y a-t-il des points manquants dans ce jeux de données ?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "raw_data[raw_data.isnull().any(axis=1)]\n", "\n", "data = raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il n'y a pas des points manquants, donc on peut continuer l'analyse" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nos données utilisent une convention inhabituelle: le numéro de semaine est collé à l'année, donnant l'impression qu'il s'agit\n", "de nombre entier. C'est comme ça que Pandas les interprète." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un deuxième problème est que Pandas ne comprend pas les numéros de semaine. Il faut lui fournir les dates de début et de fin de semaine. Nous utilisons pour cela la bibliothèque `isoweek`.\n", "\n", "Comme la conversion des semaines est devenu assez complexe, nous écrivons une petite fonction Python pour cela. Ensuite, nous\n", "l'appliquons à tous les points de nos donnés. Les résultats vont dans une nouvelle colonne 'period'." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il restent deux petites modifications à faire.\n", "\n", "Premièrement, nous définissons les périodes d'observation comme nouvel index de notre jeux de données. Ceci en fait une suite chronologique, ce qui sera pratique par la suite.\n", "\n", "Deuxièmement, nous trions les points par période, dans le sens chronologique." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }