## The incidence of chickenpox in France (2016-2024)

The data on the incidence of chickenpox-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. The dataset used is starting in 2016 and ends with 2024.

In [38]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import os
from isoweek import Week

In [39]:
data_url = "https://www.sentiweb.fr/datasets/all/inc-7-RDD-ds2.csv"
filename = "incidence-chickenpox.csv"

1. Download -> if there is not a local file already

In [40]:
if not os.path.exists(filename):
    raw_data = pd.read_csv(data_url, skiprows=1)
else:
    raw_data = pd.read_csv(filename)

2. Remove rows with missing values

In [46]:
raw_data = raw_data.dropna()
raw_data

Unnamed: 0,week,geo_insee,indicator,inc,inc100,inc_up,inc_low,inc100_up,inc100_low,period
0,201601,44,7,574,10,861,287,15,5,2016-01-04/2016-01-10
1,201601,75,7,1513,25,2099,927,35,15,2016-01-04/2016-01-10
2,201601,84,7,2363,30,2958,1768,37,22,2016-01-04/2016-01-10
3,201601,27,7,686,24,1058,314,36,11,2016-01-04/2016-01-10
4,201601,53,7,532,16,874,190,26,6,2016-01-04/2016-01-10
5,201601,24,7,394,15,625,163,24,6,2016-01-04/2016-01-10
6,201601,94,7,38,12,82,0,25,0,2016-01-04/2016-01-10
7,201601,11,7,3030,25,3788,2272,31,19,2016-01-04/2016-01-10
8,201601,76,7,842,14,1307,377,22,6,2016-01-04/2016-01-10
9,201601,32,7,2100,34,2711,1489,44,24,2016-01-04/2016-01-10


3. Convert 'week' to period 

In [47]:
def convert_week(yearweek):
    y, w = int(str(yearweek)[:4]), int(str(yearweek)[4:])
    return pd.Period(Week(y, w).day(0), 'W')

raw_data['period'] = [convert_week(x) for x in raw_data['week']]
raw_data

Unnamed: 0,week,geo_insee,indicator,inc,inc100,inc_up,inc_low,inc100_up,inc100_low,period
0,201601,44,7,574,10,861,287,15,5,2016-01-04/2016-01-10
1,201601,75,7,1513,25,2099,927,35,15,2016-01-04/2016-01-10
2,201601,84,7,2363,30,2958,1768,37,22,2016-01-04/2016-01-10
3,201601,27,7,686,24,1058,314,36,11,2016-01-04/2016-01-10
4,201601,53,7,532,16,874,190,26,6,2016-01-04/2016-01-10
5,201601,24,7,394,15,625,163,24,6,2016-01-04/2016-01-10
6,201601,94,7,38,12,82,0,25,0,2016-01-04/2016-01-10
7,201601,11,7,3030,25,3788,2272,31,19,2016-01-04/2016-01-10
8,201601,76,7,842,14,1307,377,22,6,2016-01-04/2016-01-10
9,201601,32,7,2100,34,2711,1489,44,24,2016-01-04/2016-01-10


4. Set 'period' as index and sort the dataset

In [43]:
data = raw_data.set_index('period').sort_index()


5. Choose September 1st as the beginning of each annual period

In [44]:
sept_start = [pd.Period(pd.Timestamp(y, 9, 1), 'W')
               for y in range(2016, data.index[-1].year)]


6. Collect the incidence per year information

In [45]:
years = []
incidence = []
for start, end in zip(sept_start[:-1], sept_start[1:]):
    season = data['inc'][start:end]  # No need for -1
    assert abs(len(season) - 52) < 3
    incidence.append(season.sum())
    years.append(end.year)

annual = pd.Series(data=incidence, index=years)


TypeError: Cannot compare type 'Period' with type 'int'