{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ " %matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-7.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voici l'explication des colonnes donnée [sur le site d'origine](http://www.sentiweb.fr/france/fr/?page=json&file=csv-schema-v1&type=csv)\n", "\n", "\n", "|Name\t|Type\t|Description\n", "|--- | --- | --:\n", "|week PK\t|integer\t\t|ISO8601 Yearweek number as numeric (year*100 + week nubmer)\n", "|geo_insee PK\t|string\t\t|Identifier of the geographic area, from INSEE https://www.insee.fr\n", "|geo_name\t|string\t\t|Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading\n", "|indicator PK\t|integer\t|\tUnique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json\n", "|inc\t|integer\t\t|Estimated incidence value for the time step, in the geographic level\n", "|inc_low\t|integer\t\t|Lower bound of the estimated incidence 95% Confidence Interval\n", "|inc_up\t|integer\t\t|Upper bound of the estimated incidence 95% Confidence Interval\n", "|inc100\t|integer\t\t|Estimated rate incidence per 100,000 inhabitants\n", "|inc100_low\t|integer\t|\tLower bound of the estimated incidence 95% Confidence Interval\n", "|inc100_up\t|integer\t|\tUpper bound of the estimated rate incidence 95% Confidence Interval\n", "\n", "Missing value : -" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le code ci-dessous est exécuté **une seule fois** afin de créer *localement* le fichier de données." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data\n", "raw_data.to_csv('incidence-PAY-7.csv', header = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les lignes ci-dessus sont maintenant mises en commentaires et remplacées par le code ci-dessous, où on va récupérer en priorité les données localement, si elles sont disponibles, sinon sur le site du *réseau Sentinelle*." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "existe\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
002020507801850011103512717FRFrance
1120204975100319170098511FRFrance
22202048766834312905410614FRFrance
3320204774999296370358511FRFrance
442020467375219635541639FRFrance
552020457369620165376639FRFrance
6620204474391237564077410FRFrance
7720204374376250562477410FRFrance
882020427400019796021639FRFrance
992020417396120995823639FRFrance
\n", "
" ], "text/plain": [ " Unnamed: 0 week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 0 202050 7 8018 5001 11035 12 7 \n", "1 1 202049 7 5100 3191 7009 8 5 \n", "2 2 202048 7 6683 4312 9054 10 6 \n", "3 3 202047 7 4999 2963 7035 8 5 \n", "4 4 202046 7 3752 1963 5541 6 3 \n", "5 5 202045 7 3696 2016 5376 6 3 \n", "6 6 202044 7 4391 2375 6407 7 4 \n", "7 7 202043 7 4376 2505 6247 7 4 \n", "8 8 202042 7 4000 1979 6021 6 3 \n", "9 9 202041 7 3961 2099 5823 6 3 \n", "\n", " inc100_up geo_insee geo_name \n", "0 17 FR France \n", "1 11 FR France \n", "2 14 FR France \n", "3 11 FR France \n", "4 9 FR France \n", "5 9 FR France \n", "6 10 FR France \n", "7 10 FR France \n", "8 9 FR France \n", "9 9 FR France " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "if os.path.isfile('incidence-PAY-7.csv'):\n", " raw_data = pd.read_csv('incidence-PAY-7.csv')\n", " print('existe')\n", "else:\n", " print(\"n'existe pas\")\n", " raw_data = pd.read_csv(data_url, skiprows = 1)\n", "len(raw_data)\n", "raw_data[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recherche des points manquants dans le jeu de données." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Unnamed: 0, week, indicator, inc, inc_low, inc_up, inc100, inc100_low, inc100_up, geo_insee, geo_name]\n", "Index: []" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il semble qu'il n'y ait aucun point manquant dans les données. Je garde la même convention que pour la grippe, en copiant les données dans la variable _data_" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
002020507801850011103512717FRFrance
1120204975100319170098511FRFrance
22202048766834312905410614FRFrance
3320204774999296370358511FRFrance
442020467375219635541639FRFrance
552020457369620165376639FRFrance
6620204474391237564077410FRFrance
7720204374376250562477410FRFrance
882020427400019796021639FRFrance
992020417396120995823639FRFrance
1010202040720786753481315FRFrance
1111202039710492371861213FRFrance
1212202038722537823724315FRFrance
1313202037715844052763204FRFrance
141420203679191001738102FRFrance
1515202035782801694102FRFrance
1616202034722723714173306FRFrance
1717202033712841772391204FRFrance
1818202032726506894611417FRFrance
1919202031713031002506204FRFrance
202020203071385752695204FRFrance
21212020297841101672102FRFrance
2222202028772801515102FRFrance
232320202779861491823102FRFrance
2424202026769401454102FRFrance
252520202572280597001FRFrance
262620202473880959102FRFrance
2727202023755811115102FRFrance
282820202272770633001FRFrance
29292020217602361168102FRFrance
....................................
153715371991267176081130423912312042FRFrance
153815381991257161691070021638281838FRFrance
153915391991247161711007122271281739FRFrance
15401540199123711947767116223211329FRFrance
15411541199122715452995320951271737FRFrance
15421542199121714903897520831261636FRFrance
154315431991207190531274225364342345FRFrance
154415441991197167391124622232291939FRFrance
154515451991187213851388228888382551FRFrance
15461546199117713462887718047241632FRFrance
154715471991167148571006819646261834FRFrance
15481548199115713975978118169251832FRFrance
15491549199114712265768416846221430FRFrance
1550155019911379567604113093171123FRFrance
15511551199112710864733114397191325FRFrance
155215521991117155741118419964271935FRFrance
155315531991107166431137221914292038FRFrance
15541554199109713741878018702241533FRFrance
15551555199108713289881317765231531FRFrance
15561556199107712337807716597221529FRFrance
15571557199106710877701314741191226FRFrance
15581558199105710442654414340181125FRFrance
155915591991047791345631126314820FRFrance
156015601991037153871048420290271836FRFrance
156115611991027162771104621508292038FRFrance
156215621991017155651027120859271836FRFrance
156315631990527193751329525455342345FRFrance
156415641990517190801380724353342543FRFrance
15651565199050711079666015498201228FRFrance
156615661990497114302610205FRFrance
\n", "

1567 rows × 11 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 week indicator inc inc_low inc_up inc100 \\\n", "0 0 202050 7 8018 5001 11035 12 \n", "1 1 202049 7 5100 3191 7009 8 \n", "2 2 202048 7 6683 4312 9054 10 \n", "3 3 202047 7 4999 2963 7035 8 \n", "4 4 202046 7 3752 1963 5541 6 \n", "5 5 202045 7 3696 2016 5376 6 \n", "6 6 202044 7 4391 2375 6407 7 \n", "7 7 202043 7 4376 2505 6247 7 \n", "8 8 202042 7 4000 1979 6021 6 \n", "9 9 202041 7 3961 2099 5823 6 \n", "10 10 202040 7 2078 675 3481 3 \n", "11 11 202039 7 1049 237 1861 2 \n", "12 12 202038 7 2253 782 3724 3 \n", "13 13 202037 7 1584 405 2763 2 \n", "14 14 202036 7 919 100 1738 1 \n", "15 15 202035 7 828 0 1694 1 \n", "16 16 202034 7 2272 371 4173 3 \n", "17 17 202033 7 1284 177 2391 2 \n", "18 18 202032 7 2650 689 4611 4 \n", "19 19 202031 7 1303 100 2506 2 \n", "20 20 202030 7 1385 75 2695 2 \n", "21 21 202029 7 841 10 1672 1 \n", "22 22 202028 7 728 0 1515 1 \n", "23 23 202027 7 986 149 1823 1 \n", "24 24 202026 7 694 0 1454 1 \n", "25 25 202025 7 228 0 597 0 \n", "26 26 202024 7 388 0 959 1 \n", "27 27 202023 7 558 1 1115 1 \n", "28 28 202022 7 277 0 633 0 \n", "29 29 202021 7 602 36 1168 1 \n", "... ... ... ... ... ... ... ... \n", "1537 1537 199126 7 17608 11304 23912 31 \n", "1538 1538 199125 7 16169 10700 21638 28 \n", "1539 1539 199124 7 16171 10071 22271 28 \n", "1540 1540 199123 7 11947 7671 16223 21 \n", "1541 1541 199122 7 15452 9953 20951 27 \n", "1542 1542 199121 7 14903 8975 20831 26 \n", "1543 1543 199120 7 19053 12742 25364 34 \n", "1544 1544 199119 7 16739 11246 22232 29 \n", "1545 1545 199118 7 21385 13882 28888 38 \n", "1546 1546 199117 7 13462 8877 18047 24 \n", "1547 1547 199116 7 14857 10068 19646 26 \n", "1548 1548 199115 7 13975 9781 18169 25 \n", "1549 1549 199114 7 12265 7684 16846 22 \n", "1550 1550 199113 7 9567 6041 13093 17 \n", "1551 1551 199112 7 10864 7331 14397 19 \n", "1552 1552 199111 7 15574 11184 19964 27 \n", "1553 1553 199110 7 16643 11372 21914 29 \n", "1554 1554 199109 7 13741 8780 18702 24 \n", "1555 1555 199108 7 13289 8813 17765 23 \n", "1556 1556 199107 7 12337 8077 16597 22 \n", "1557 1557 199106 7 10877 7013 14741 19 \n", "1558 1558 199105 7 10442 6544 14340 18 \n", "1559 1559 199104 7 7913 4563 11263 14 \n", "1560 1560 199103 7 15387 10484 20290 27 \n", "1561 1561 199102 7 16277 11046 21508 29 \n", "1562 1562 199101 7 15565 10271 20859 27 \n", "1563 1563 199052 7 19375 13295 25455 34 \n", "1564 1564 199051 7 19080 13807 24353 34 \n", "1565 1565 199050 7 11079 6660 15498 20 \n", "1566 1566 199049 7 1143 0 2610 2 \n", "\n", " inc100_low inc100_up geo_insee geo_name \n", "0 7 17 FR France \n", "1 5 11 FR France \n", "2 6 14 FR France \n", "3 5 11 FR France \n", "4 3 9 FR France \n", "5 3 9 FR France \n", "6 4 10 FR France \n", "7 4 10 FR France \n", "8 3 9 FR France \n", "9 3 9 FR France \n", "10 1 5 FR France \n", "11 1 3 FR France \n", "12 1 5 FR France \n", "13 0 4 FR France \n", "14 0 2 FR France \n", "15 0 2 FR France \n", "16 0 6 FR France \n", "17 0 4 FR France \n", "18 1 7 FR France \n", "19 0 4 FR France \n", "20 0 4 FR France \n", "21 0 2 FR France \n", "22 0 2 FR France \n", "23 0 2 FR France \n", "24 0 2 FR France \n", "25 0 1 FR France \n", "26 0 2 FR France \n", "27 0 2 FR France \n", "28 0 1 FR France \n", "29 0 2 FR France \n", "... ... ... ... ... \n", "1537 20 42 FR France \n", "1538 18 38 FR France \n", "1539 17 39 FR France \n", "1540 13 29 FR France \n", "1541 17 37 FR France \n", "1542 16 36 FR France \n", "1543 23 45 FR France \n", "1544 19 39 FR France \n", "1545 25 51 FR France \n", "1546 16 32 FR France \n", "1547 18 34 FR France \n", "1548 18 32 FR France \n", "1549 14 30 FR France \n", "1550 11 23 FR France \n", "1551 13 25 FR France \n", "1552 19 35 FR France \n", "1553 20 38 FR France \n", "1554 15 33 FR France \n", "1555 15 31 FR France \n", "1556 15 29 FR France \n", "1557 12 26 FR France \n", "1558 11 25 FR France \n", "1559 8 20 FR France \n", "1560 18 36 FR France \n", "1561 20 38 FR France \n", "1562 18 36 FR France \n", "1563 23 45 FR France \n", "1564 25 43 FR France \n", "1565 12 28 FR France \n", "1566 0 5 FR France \n", "\n", "[1567 rows x 11 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = raw_data.copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fonction de conversion de la notation *yyyyww* pour la date en année et semaine séparés." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Définition des périodes d'observation comme nouvel index des données, et tri par période, chronologiquement, pour en faciliter l'étude." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vérification qu'il n'y a pas de trou dans les données, en laissant une marge d'erreur d'une seconde." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pas de résultat au calcul ci-dessous. Donc les données sont complètes.\n", "\n", "Visualisation des données :" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zoom sur les dernières années pour mieux voir la périodicité." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le minimum a l'air d'être autour de septembre (vers la fin du 3ème trimestre). Du coup, on choisit le **1$^er$ septembre** comme début d'année pour empiler les données." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "first_september_week = [pd.Period(pd.Timestamp(y, 9, 1), 'W')\n", " for y in range(1990,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On se protège contre le changement du nombre de semaines par an et calcul du ombre d'incidences hebdomadaires." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_september_week[:-1],\n", " first_september_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", "# assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "N'étant pas familier avec la commande **assert**, je ne sais pas interprêter l'erreur ci-dessus. Je continue, et y reviendrai éventuellement...\n", "Il semble que le problème vienne de la commande. Elle est donc mise en commentaire." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tracé des incidences annuelles :" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hideOutput": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "29\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "print(len(yearly_incidence))\n", "yearly_incidence.plot(style='*')" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "516689 842373\n" ] } ], "source": [ "print(min(yearly_incidence), max(yearly_incidence))" ] }, { "cell_type": "markdown", "metadata": { "hideOutput": true }, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }