{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Incidence of chickenpox-like illness in France" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data on the incidence of chickenpox-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. Only the complete dataset, starting in 1991 and ending with a recent week, is available for download." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_url = \"http://www.sentiweb.fr/datasets/incidence-PAY-7.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the documentation of the data from [the download site](https://ns.sentiweb.fr/incidence/csv-schema-v1.json):\n", "\n", "| Column name | Description |\n", "|--------------|---------------------------------------------------------------------------------------------------------------------------|\n", "| `week` | ISO8601 Yearweek number as numeric (year times 100 + week nubmer) |\n", "| `indicator` | Unique identifier of the indicator, see metadata document https://www.sentiweb.fr/meta.json |\n", "| `inc` | Estimated incidence value for the time step, in the geographic level |\n", "| `inc_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc_up` | Upper bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100` | Estimated rate incidence per 100,000 inhabitants |\n", "| `inc100_low` | Lower bound of the estimated incidence 95% Confidence Interval |\n", "| `inc100_up` | Upper bound of the estimated rate incidence 95% Confidence Interval |\n", "| `geo_insee` | Identifier of the geographic area, from INSEE https://www.insee.fr |\n", "| `geo_name` | Geographic label of the area, corresponding to INSEE code. This label is not an id and is only provided for human reading |\n", "\n", "The first line of the CSV file is a comment, which we ignore with `skip=1`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the local file does not exist before downloading" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import os\n", "import urllib.request\n", "\n", "data_file = \"incidence-PAY-7.csv\"\n", "\n", "if not os.path.exists(data_file):\n", " urllib.request.urlretrieve(data_url, data_file)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
02023287858043311282913719FRFrance
12023277739346721011411715FRFrance
220232679192622312161141018FRFrance
3202325711498825714739171222FRFrance
4202324711115796814262171222FRFrance
520232371256361341899219929FRFrance
6202322712184812516243181224FRFrance
7202321711349759815100171123FRFrance
82023207900046151338514721FRFrance
92023197934460911259714919FRFrance
10202318710671729114051161121FRFrance
112023177918461621220614919FRFrance
12202316711387801414760171222FRFrance
13202315714040761320467211131FRFrance
142023147152471103219462231729FRFrance
15202313713322970016944201525FRFrance
16202312710374721813530161121FRFrance
1720231174919288069587410FRFrance
1820231074854273169777410FRFrance
19202309770044548946011715FRFrance
202023087817553161103412816FRFrance
21202307765953782940810614FRFrance
222023067959560171317314919FRFrance
2320230576237390785679513FRFrance
2420230476299397386259612FRFrance
2520230376063379883289612FRFrance
262023027657630601009210515FRFrance
272023017815354701083612816FRFrance
2820225275171271776258412FRFrance
2920225176226382286309513FRFrance
.................................
16721991267176081130423912312042FRFrance
16731991257161691070021638281838FRFrance
16741991247161711007122271281739FRFrance
1675199123711947767116223211329FRFrance
1676199122715452995320951271737FRFrance
1677199121714903897520831261636FRFrance
16781991207190531274225364342345FRFrance
16791991197167391124622232291939FRFrance
16801991187213851388228888382551FRFrance
1681199117713462887718047241632FRFrance
16821991167148571006819646261834FRFrance
1683199115713975978118169251832FRFrance
1684199114712265768416846221430FRFrance
168519911379567604113093171123FRFrance
1686199112710864733114397191325FRFrance
16871991117155741118419964271935FRFrance
16881991107166431137221914292038FRFrance
1689199109713741878018702241533FRFrance
1690199108713289881317765231531FRFrance
1691199107712337807716597221529FRFrance
1692199106710877701314741191226FRFrance
1693199105710442654414340181125FRFrance
16941991047791345631126314820FRFrance
16951991037153871048420290271836FRFrance
16961991027162771104621508292038FRFrance
16971991017155651027120859271836FRFrance
16981990527193751329525455342345FRFrance
16991990517190801380724353342543FRFrance
1700199050711079666015498201228FRFrance
17011990497114302610205FRFrance
\n", "

1702 rows × 10 columns

\n", "
" ], "text/plain": [ " week indicator inc inc_low inc_up inc100 inc100_low \\\n", "0 202328 7 8580 4331 12829 13 7 \n", "1 202327 7 7393 4672 10114 11 7 \n", "2 202326 7 9192 6223 12161 14 10 \n", "3 202325 7 11498 8257 14739 17 12 \n", "4 202324 7 11115 7968 14262 17 12 \n", "5 202323 7 12563 6134 18992 19 9 \n", "6 202322 7 12184 8125 16243 18 12 \n", "7 202321 7 11349 7598 15100 17 11 \n", "8 202320 7 9000 4615 13385 14 7 \n", "9 202319 7 9344 6091 12597 14 9 \n", "10 202318 7 10671 7291 14051 16 11 \n", "11 202317 7 9184 6162 12206 14 9 \n", "12 202316 7 11387 8014 14760 17 12 \n", "13 202315 7 14040 7613 20467 21 11 \n", "14 202314 7 15247 11032 19462 23 17 \n", "15 202313 7 13322 9700 16944 20 15 \n", "16 202312 7 10374 7218 13530 16 11 \n", "17 202311 7 4919 2880 6958 7 4 \n", "18 202310 7 4854 2731 6977 7 4 \n", "19 202309 7 7004 4548 9460 11 7 \n", "20 202308 7 8175 5316 11034 12 8 \n", "21 202307 7 6595 3782 9408 10 6 \n", "22 202306 7 9595 6017 13173 14 9 \n", "23 202305 7 6237 3907 8567 9 5 \n", "24 202304 7 6299 3973 8625 9 6 \n", "25 202303 7 6063 3798 8328 9 6 \n", "26 202302 7 6576 3060 10092 10 5 \n", "27 202301 7 8153 5470 10836 12 8 \n", "28 202252 7 5171 2717 7625 8 4 \n", "29 202251 7 6226 3822 8630 9 5 \n", "... ... ... ... ... ... ... ... \n", "1672 199126 7 17608 11304 23912 31 20 \n", "1673 199125 7 16169 10700 21638 28 18 \n", "1674 199124 7 16171 10071 22271 28 17 \n", "1675 199123 7 11947 7671 16223 21 13 \n", "1676 199122 7 15452 9953 20951 27 17 \n", "1677 199121 7 14903 8975 20831 26 16 \n", "1678 199120 7 19053 12742 25364 34 23 \n", "1679 199119 7 16739 11246 22232 29 19 \n", "1680 199118 7 21385 13882 28888 38 25 \n", "1681 199117 7 13462 8877 18047 24 16 \n", "1682 199116 7 14857 10068 19646 26 18 \n", "1683 199115 7 13975 9781 18169 25 18 \n", "1684 199114 7 12265 7684 16846 22 14 \n", "1685 199113 7 9567 6041 13093 17 11 \n", "1686 199112 7 10864 7331 14397 19 13 \n", "1687 199111 7 15574 11184 19964 27 19 \n", "1688 199110 7 16643 11372 21914 29 20 \n", "1689 199109 7 13741 8780 18702 24 15 \n", "1690 199108 7 13289 8813 17765 23 15 \n", "1691 199107 7 12337 8077 16597 22 15 \n", "1692 199106 7 10877 7013 14741 19 12 \n", "1693 199105 7 10442 6544 14340 18 11 \n", "1694 199104 7 7913 4563 11263 14 8 \n", "1695 199103 7 15387 10484 20290 27 18 \n", "1696 199102 7 16277 11046 21508 29 20 \n", "1697 199101 7 15565 10271 20859 27 18 \n", "1698 199052 7 19375 13295 25455 34 23 \n", "1699 199051 7 19080 13807 24353 34 25 \n", "1700 199050 7 11079 6660 15498 20 12 \n", "1701 199049 7 1143 0 2610 2 0 \n", "\n", " inc100_up geo_insee geo_name \n", "0 19 FR France \n", "1 15 FR France \n", "2 18 FR France \n", "3 22 FR France \n", "4 22 FR France \n", "5 29 FR France \n", "6 24 FR France \n", "7 23 FR France \n", "8 21 FR France \n", "9 19 FR France \n", "10 21 FR France \n", "11 19 FR France \n", "12 22 FR France \n", "13 31 FR France \n", "14 29 FR France \n", "15 25 FR France \n", "16 21 FR France \n", "17 10 FR France \n", "18 10 FR France \n", "19 15 FR France \n", "20 16 FR France \n", "21 14 FR France \n", "22 19 FR France \n", "23 13 FR France \n", "24 12 FR France \n", "25 12 FR France \n", "26 15 FR France \n", "27 16 FR France \n", "28 12 FR France \n", "29 13 FR France \n", "... ... ... ... \n", "1672 42 FR France \n", "1673 38 FR France \n", "1674 39 FR France \n", "1675 29 FR France \n", "1676 37 FR France \n", "1677 36 FR France \n", "1678 45 FR France \n", "1679 39 FR France \n", "1680 51 FR France \n", "1681 32 FR France \n", "1682 34 FR France \n", "1683 32 FR France \n", "1684 30 FR France \n", "1685 23 FR France \n", "1686 25 FR France \n", "1687 35 FR France \n", "1688 38 FR France \n", "1689 33 FR France \n", "1690 31 FR France \n", "1691 29 FR France \n", "1692 26 FR France \n", "1693 25 FR France \n", "1694 20 FR France \n", "1695 36 FR France \n", "1696 38 FR France \n", "1697 36 FR France \n", "1698 45 FR France \n", "1699 43 FR France \n", "1700 28 FR France \n", "1701 5 FR France \n", "\n", "[1702 rows x 10 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file, skiprows=1)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there missing data points?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weekindicatorincinc_lowinc_upinc100inc100_lowinc100_upgeo_inseegeo_name
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [week, indicator, inc, inc_low, inc_up, inc100, inc100_low, inc100_up, geo_insee, geo_name]\n", "Index: []" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No, there's not" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset uses an uncommon encoding; the week number is attached\n", "to the year number, leaving the impression of a six-digit integer.\n", "That is how Pandas interprets it.\n", "\n", "A second problem is that Pandas does not know about week numbers.\n", "It needs to be given the dates of the beginning and end of the week.\n", "We use the library `isoweek` for that.\n", "\n", "Since the conversion is a bit lengthy, we write a small Python \n", "function for doing it. Then we apply it to all points in our dataset. \n", "The results go into a new column 'period'." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def convert_week(year_and_week_int):\n", " year_and_week_str = str(year_and_week_int)\n", " year = int(year_and_week_str[:4])\n", " week = int(year_and_week_str[4:])\n", " w = isoweek.Week(year, week)\n", " return pd.Period(w.day(0), 'W')\n", "\n", "data = raw_data\n", "data['period'] = [convert_week(yw) for yw in data['week']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two more small changes to make.\n", "\n", "First, we define the observation periods as the new index of\n", "our dataset. That turns it into a time series, which will be\n", "convenient later on.\n", "\n", "Second, we sort the points chronologically." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "sorted_data = data.set_index('period').sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the consistency of the data. Between the end of a period and the beginning of the next one, the difference should be zero, or very small. We tolerate an error of one second." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "periods = sorted_data.index\n", "for p1, p2 in zip(periods[:-1], periods[1:]):\n", " delta = p2.to_timestamp() - p1.end_time\n", " if delta > pd.Timedelta('1s'):\n", " print(p1, p2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data is OK" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A first look at the data!" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'].plot()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-200:].plot()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sorted_data['inc'][-350:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Study of the annual incidence" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1991,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the annual incidences." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A sorted list makes it easier to find the highest values (at the end)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2020 229363\n", "2021 363278\n", "2002 502271\n", "2018 543281\n", "1996 553859\n", "2017 557449\n", "2019 584926\n", "2000 605096\n", "2015 613286\n", "2012 620315\n", "2022 638443\n", "2011 645042\n", "1995 648598\n", "2001 650660\n", "1993 653058\n", "2005 654308\n", "2006 657482\n", "1998 660316\n", "2014 673458\n", "1997 679308\n", "1994 682920\n", "2007 701566\n", "2013 708874\n", "2004 736266\n", "2008 745701\n", "2003 770211\n", "2016 780645\n", "1999 784963\n", "1992 821558\n", "2009 822819\n", "2010 848236\n", "dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yearly_incidence.sort_values()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1991,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "yearly_incidence.plot(style='*')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2020 229363\n", "2021 363278\n", "2002 502271\n", "2018 543281\n", "1996 553859\n", "2017 557449\n", "2019 584926\n", "2000 605096\n", "2015 613286\n", "2012 620315\n", "2022 638443\n", "2011 645042\n", "1995 648598\n", "2001 650660\n", "1993 653058\n", "2005 654308\n", "2006 657482\n", "1998 660316\n", "2014 673458\n", "1997 679308\n", "1994 682920\n", "2007 701566\n", "2013 708874\n", "2004 736266\n", "2008 745701\n", "2003 770211\n", "2016 780645\n", "1999 784963\n", "1992 821558\n", "2009 822819\n", "2010 848236\n", "dtype: int64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ " yearly_incidence.sort_values()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1990,\n", " sorted_data.index[-1].year)]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m first_august_week[1:]):\n\u001b[1;32m 5\u001b[0m \u001b[0mone_year\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msorted_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'inc'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mweek1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mweek2\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mabs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mone_year\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m52\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0myearly_incidence\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mone_year\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0myear\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mweek2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myear\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: " ] } ], "source": [ "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2020 229363\n", "2021 363278\n", "2002 502271\n", "2018 543281\n", "1996 553859\n", "2017 557449\n", "2019 584926\n", "2000 605096\n", "2015 613286\n", "2012 620315\n", "2022 638443\n", "2011 645042\n", "1995 648598\n", "2001 650660\n", "1993 653058\n", "2005 654308\n", "2006 657482\n", "1998 660316\n", "2014 673458\n", "1997 679308\n", "1994 682920\n", "2007 701566\n", "2013 708874\n", "2004 736266\n", "2008 745701\n", "2003 770211\n", "2016 780645\n", "1999 784963\n", "2009 822819\n", "2010 848236\n", "dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')\n", " for y in range(1992,\n", " sorted_data.index[-1].year)]\n", "\n", "year = []\n", "yearly_incidence = []\n", "for week1, week2 in zip(first_august_week[:-1],\n", " first_august_week[1:]):\n", " one_year = sorted_data['inc'][week1:week2-1]\n", " assert abs(len(one_year)-52) < 2\n", " yearly_incidence.append(one_year.sum())\n", " year.append(week2.year)\n", "yearly_incidence = pd.Series(data=yearly_incidence, index=year)\n", "yearly_incidence.plot(style='*')\n", "yearly_incidence.sort_values()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }