{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of chickenpox in France" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of chickenpox are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1991 and ending with a recent week, is available for download." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Offline data available.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
02022387175903539306FRFrance
1202237717354942976315FRFrance
2202236710691781960213FRFrance
3202235715814002762204FRFrance
4202234722667883744315FRFrance
52022337734001739911026FRFrance
62022327780140861151612618FRFrance
7202231768964170962210614FRFrance
82022307903957701230814919FRFrance
92022297148511006019642221529FRFrance
102022287154711102819914231630FRFrance
112022277211911619826184322440FRFrance
122022267168541280620902251931FRFrance
132022257222461801126481342840FRFrance
142022247224581810526811342741FRFrance
152022237187721487522669282234FRFrance
162022227189161494122891292335FRFrance
172022217203101630724313312537FRFrance
182022207235851900428166362943FRFrance
192022197185931418123005282135FRFrance
202022187178511396321739272133FRFrance
212022177203141600124627312438FRFrance
222022167196601486024460302337FRFrance
232022157177991371521883272133FRFrance
242022147170051316220848262032FRFrance
252022137154481165919237231729FRFrance
262022127147021079418610221628FRFrance
27202211711729834715111181323FRFrance
282022107133141003616592201525FRFrance
29202209710485760013370161220FRFrance
.................................
16301991267176081130423912312042FRFrance
16311991257161691070021638281838FRFrance
16321991247161711007122271281739FRFrance
1633199123711947767116223211329FRFrance
1634199122715452995320951271737FRFrance
1635199121714903897520831261636FRFrance
16361991207190531274225364342345FRFrance
16371991197167391124622232291939FRFrance
16381991187213851388228888382551FRFrance
1639199117713462887718047241632FRFrance
16401991167148571006819646261834FRFrance
1641199115713975978118169251832FRFrance
1642199114712265768416846221430FRFrance
164319911379567604113093171123FRFrance
1644199112710864733114397191325FRFrance
16451991117155741118419964271935FRFrance
16461991107166431137221914292038FRFrance
1647199109713741878018702241533FRFrance
1648199108713289881317765231531FRFrance
1649199107712337807716597221529FRFrance
1650199106710877701314741191226FRFrance
1651199105710442654414340181125FRFrance
16521991047791345631126314820FRFrance
16531991037153871048420290271836FRFrance
16541991027162771104621508292038FRFrance
16551991017155651027120859271836FRFrance
16561990527193751329525455342345FRFrance
16571990517190801380724353342543FRFrance
1658199050711079666015498201228FRFrance
16591990497114302610205FRFrance
\n", "

1660 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202238 7 1759 0 3539 3 0 \n", "1 202237 7 1735 494 2976 3 1 \n", "2 202236 7 1069 178 1960 2 1 \n", "3 202235 7 1581 400 2762 2 0 \n", "4 202234 7 2266 788 3744 3 1 \n", "5 202233 7 7340 0 17399 11 0 \n", "6 202232 7 7801 4086 11516 12 6 \n", "7 202231 7 6896 4170 9622 10 6 \n", "8 202230 7 9039 5770 12308 14 9 \n", "9 202229 7 14851 10060 19642 22 15 \n", "10 202228 7 15471 11028 19914 23 16 \n", "11 202227 7 21191 16198 26184 32 24 \n", "12 202226 7 16854 12806 20902 25 19 \n", "13 202225 7 22246 18011 26481 34 28 \n", "14 202224 7 22458 18105 26811 34 27 \n", "15 202223 7 18772 14875 22669 28 22 \n", "16 202222 7 18916 14941 22891 29 23 \n", "17 202221 7 20310 16307 24313 31 25 \n", "18 202220 7 23585 19004 28166 36 29 \n", "19 202219 7 18593 14181 23005 28 21 \n", "20 202218 7 17851 13963 21739 27 21 \n", "21 202217 7 20314 16001 24627 31 24 \n", "22 202216 7 19660 14860 24460 30 23 \n", "23 202215 7 17799 13715 21883 27 21 \n", "24 202214 7 17005 13162 20848 26 20 \n", "25 202213 7 15448 11659 19237 23 17 \n", "26 202212 7 14702 10794 18610 22 16 \n", "27 202211 7 11729 8347 15111 18 13 \n", "28 202210 7 13314 10036 16592 20 15 \n", "29 202209 7 10485 7600 13370 16 12 \n", "... ... ... ... ... ... ... ... \n", "1630 199126 7 17608 11304 23912 31 20 \n", "1631 199125 7 16169 10700 21638 28 18 \n", "1632 199124 7 16171 10071 22271 28 17 \n", "1633 199123 7 11947 7671 16223 21 13 \n", "1634 199122 7 15452 9953 20951 27 17 \n", "1635 199121 7 14903 8975 20831 26 16 \n", "1636 199120 7 19053 12742 25364 34 23 \n", "1637 199119 7 16739 11246 22232 29 19 \n", "1638 199118 7 21385 13882 28888 38 25 \n", "1639 199117 7 13462 8877 18047 24 16 \n", "1640 199116 7 14857 10068 19646 26 18 \n", "1641 199115 7 13975 9781 18169 25 18 \n", "1642 199114 7 12265 7684 16846 22 14 \n", "1643 199113 7 9567 6041 13093 17 11 \n", "1644 199112 7 10864 7331 14397 19 13 \n", "1645 199111 7 15574 11184 19964 27 19 \n", "1646 199110 7 16643 11372 21914 29 20 \n", "1647 199109 7 13741 8780 18702 24 15 \n", "1648 199108 7 13289 8813 17765 23 15 \n", "1649 199107 7 12337 8077 16597 22 15 \n", "1650 199106 7 10877 7013 14741 19 12 \n", "1651 199105 7 10442 6544 14340 18 11 \n", "1652 199104 7 7913 4563 11263 14 8 \n", "1653 199103 7 15387 10484 20290 27 18 \n", "1654 199102 7 16277 11046 21508 29 20 \n", "1655 199101 7 15565 10271 20859 27 18 \n", "1656 199052 7 19375 13295 25455 34 23 \n", "1657 199051 7 19080 13807 24353 34 25 \n", "1658 199050 7 11079 6660 15498 20 12 \n", "1659 199049 7 1143 0 2610 2 0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 6 FR France \n", "1 5 FR France \n", "2 3 FR France \n", "3 4 FR France \n", "4 5 FR France \n", "5 26 FR France \n", "6 18 FR France \n", "7 14 FR France \n", "8 19 FR France \n", "9 29 FR France \n", "10 30 FR France \n", "11 40 FR France \n", "12 31 FR France \n", "13 40 FR France \n", "14 41 FR France \n", "15 34 FR France \n", "16 35 FR France \n", "17 37 FR France \n", "18 43 FR France \n", "19 35 FR France \n", "20 33 FR France \n", "21 38 FR France \n", "22 37 FR France \n", "23 33 FR France \n", "24 32 FR France \n", "25 29 FR France \n", "26 28 FR France \n", "27 23 FR France \n", "28 25 FR France \n", "29 20 FR France \n", "... ... ... ... \n", "1630 42 FR France \n", "1631 38 FR France \n", "1632 39 FR France \n", "1633 29 FR France \n", "1634 37 FR France \n", "1635 36 FR France \n", "1636 45 FR France \n", "1637 39 FR France \n", "1638 51 FR France \n", "1639 32 FR France \n", "1640 34 FR France \n", "1641 32 FR France \n", "1642 30 FR France \n", "1643 23 FR France \n", "1644 25 FR France \n", "1645 35 FR France \n", "1646 38 FR France \n", "1647 33 FR France \n", "1648 31 FR France \n", "1649 29 FR France \n", "1650 26 FR France \n", "1651 25 FR France \n", "1652 20 FR France \n", "1653 36 FR France \n", "1654 38 FR France \n", "1655 36 FR France \n", "1656 45 FR France \n", "1657 43 FR France \n", "1658 28 FR France \n", "1659 5 FR France \n", "\n", "[1660 rows x 10 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_url = 'https://www.sentiweb.fr/datasets/incidence-PAY-7.csv'\n", "data_file = \"./incidence-PAY-7.csv\"\n", "\n", "import os\n", "from urllib import request\n", "if not os.path.exists(data_file):\n", " print('Offline data not available: attempt to retrieve database online')\n", " request.urlretrieve(data_url, data_file)\n", "else:\n", " print('Offline data available.')\n", "\n", "raw_data = pd.read_csv(data_url, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points? No, the dataset is complete." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "raw_data[raw_data.isnull().any(axis=1)]\n", "\n", "data = raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and\n", "the beginning of the next one, the difference should be zero, or very small.\n", "We tolerate an error of one second." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And a zoom on the last few years." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the peaks of the epidemic happen in the first half of the year, we define the reference period for the annual incidence from September 1st of year $N$ to September 1st of year $N+1$.\n", "\n", "Our task is a bit complicated by the fact that a year does not have an integer number of weeks. Therefore we modify our reference period a bit: instead of August 1st, we use the first day of the week containing September 1st.\n", "\n", "A final detail: the dataset starts in week 49 of 1990 and it ends in week 38 of 2022, the first and last peaks are thus incomplete." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "first_sept_week = [pd.Period(pd.Timestamp(y, 9, 1), 'W')\n", " for y in range(1991,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from this list of weeks that contain September 1st, we obtain intervals of approximately one year as the periods between two adjacent weeks in this list. We compute the sums of weekly incidences for all these periods.\n", "\n", "We also check that our periods contain between 51 and 52 weeks, as a safeguard against potential mistakes in our code." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_sept_week[:-1],\n", " first_sept_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2020 221186\n", "2021 376290\n", "2002 516689\n", "2018 542312\n", "2017 551041\n", "1996 564901\n", "2019 584066\n", "2015 604382\n", "2000 617597\n", "2001 619041\n", "2012 624573\n", "2005 628464\n", "2006 632833\n", "2011 642368\n", "1993 643387\n", "1995 652478\n", "1994 661409\n", "1998 677775\n", "1997 683434\n", "2014 685769\n", "2013 698332\n", "2007 717352\n", "2008 749478\n", "1999 756456\n", "2003 758363\n", "2004 777388\n", "2016 782114\n", "2010 829911\n", "1992 832939\n", "2009 842373\n", "dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }