## The incidence of chickenpox in France

The data on the incidence of chickenpox-like illness are available from the Web site of the [Réseau Sentinelles](http://www.sentiweb.fr/). We download them as a file in CSV format, in which each line corresponds to a week in the observation period. 

In [8]:
!pip install isoweek



In [9]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import os
from isoweek import Week

In [10]:
data_url =  "http://www.sentiweb.fr/datasets/incidence-PAY-3.csv" 
filename = "inc-7-PAY-ds3.csv"

1. Download -> if there is not a local file already

In [11]:
if not os.path.exists(filename):
    raw_data = pd.read_csv(data_url, skiprows=1)
else:
    raw_data = pd.read_csv(filename)

2. Remove rows with missing values

In [13]:
raw_data[raw_data.isnull(). any (axis= 1 )] 
data = raw_data.dropna().copy()
data 

Unnamed: 0,week,indicator,inc,inc_low,inc_up,inc100,inc100_low,inc100_up,geo_insee,geo_name
0,202524,3,22816,17621.0,28011.0,34,26.0,42.0,FR,France
1,202523,3,24564,19382.0,29746.0,37,29.0,45.0,FR,France
2,202522,3,18755,14333.0,23177.0,28,21.0,35.0,FR,France
3,202521,3,23760,18671.0,28849.0,35,27.0,43.0,FR,France
4,202520,3,20265,15814.0,24716.0,30,23.0,37.0,FR,France
5,202519,3,16264,12394.0,20134.0,24,18.0,30.0,FR,France
6,202518,3,18115,13975.0,22255.0,27,21.0,33.0,FR,France
7,202517,3,22150,17291.0,27009.0,33,26.0,40.0,FR,France
8,202516,3,28564,22550.0,34578.0,43,34.0,52.0,FR,France
9,202515,3,35721,29592.0,41850.0,53,44.0,62.0,FR,France


3. Convert 'week' to period 

In [22]:
def convert_week ( year_and_week_int ):
    year_and_week_str =  str (year_and_week_int)
    year =  int (year_and_week_str[: 4 ])
    week =  int (year_and_week_str[ 4 :])
    w = isoweek.Week(year, week)
    return  pd.Period(w.day( 0 ),  'W' )

data[ 'period' ] = [convert_week(yw)  for  yw  in  data[ 'week' ]] 

NameError: name 'isoweek' is not defined

4. Set 'period' as index and sort the dataset

In [15]:
sorted_data = data.set_index( 'period' ).sort_index() 


KeyError: 'period'

5. Choose September 1st as the beginning of each annual period

In [None]:
start_weeks = [pd.Period(pd.Timestamp(y, 9, 1), 'W') for y in range(2016, 2025)]

6. Collect the incidence per year information

In [None]:
years = []
totals = []
for w1, w2 in zip(start_weeks[:-1], start_weeks[1:]):
    season_data = sorted_data['inc'][w1:w2 - 1]
    if abs(len(season_data) - 52) < 3:
        totals.append(season_data.sum())
        years.append(w2.year)

yearly_incidence = pd.Series(data=totals, index=years)

In [None]:
yearly_incidence.plot(style='o-', title='Annual Chickenpox Incidence')
plt.ylabel("Total incidence")
plt.xlabel("Year")
plt.grid(True)
plt.show()


In [None]:
yearly_incidence.hist()
plt.title("Distribution of Yearly Incidence")
plt.xlabel("Incidence")
plt.ylabel("Count")
plt.show()

In [None]:
strongest = yearly_incidence.idxmax()
weakest = yearly_incidence.idxmin()
print(f"Strongest epidemic year: {strongest}")
print(f"Weakest epidemic year: {weakest}")