Commit 960c9f5d authored by rloic's avatar rloic

Analyse syndrome grippal

parent b69c1dd6
# Incidence du syndrome grippal
Import des librairies
```python
%matplotlib inline
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import urllib
import os.path
```
```python
remote_file = 'http://www.sentiweb.fr/datasets/incidence-PAY-3.csv'
local_file = '/tmp/incidence-PAY-3.csv'
```
On charge le fichier de données si le fichier local n'existe pas.
```python
if not os.path.isfile(local_file):
urllib.request.urlretrieve (remote_file, local_file)
```
Chargement des données depuis l'URL
```python
raw_data = pd.read_csv(local_file, skiprows=1)
raw_data
```
Visulatisation des lignes qui peuvent avoir des données manquantes
```python
raw_data[raw_data.isnull().any(axis=1)]
```
Nous supprimons cette ligne qui ne contient pas de données valables.
```python
data = raw_data.dropna().copy()
```
Fonction de convertion des dates "sentinelles" au format ISO (utilisation de datetime étant donné que isoweek ne s'installe pas).
```python
def convert_week(year_and_week_int):
year_and_week_str = str(year_and_week_int)
year = int(year_and_week_str[:4])
week = int(year_and_week_str[4:])
date = datetime.datetime.strptime("{}-W{}-1".format(year, week), "%G-W%V-%u")
return pd.Period(date, 'W')
```
Ajout de la periode sous format pandas et on filtre les périodes pour correspondre à la vidéo explicative.
```python
data['period'] = [convert_week(yw) for yw in data['week']]
data = data.loc[
(data['period'] < pd.Period(pd.Timestamp(2016, 8, 1), 'W')) & (data['period'] > pd.Period(pd.Timestamp(1985, 8, 1), 'W'))].copy()
data
```
```python
sorted_data = data.set_index('period').sort_index()
sorted_data
```
On vérifie que toutes les semaines sont présentes depuis la première capture jusqu'à la dernière.
```python
periods = sorted_data.index
for p1, p2 in zip(periods[:-1], periods[1:]):
delta = p2.to_timestamp() - p1.end_time
if delta > pd.Timedelta('1s'):
print(p1, p2)
```
La semaine manquante est la semaine dont les données n'ont pas été enregistrées.
```python
sorted_data['inc'].plot()
```
```python
sorted_data['inc'][-200:].plot()
```
```python
first_august_week = [pd.Period(pd.Timestamp(y, 8, 1), 'W')
for y in range(sorted_data.index[0].year,
sorted_data.index[-1].year)]
yearly_incidence = []
year = []
for week1, week2 in zip(first_august_week[:-1], first_august_week[1:]):
one_year = sorted_data['inc'][week1:week2-1]
assert abs(len(one_year)-52) < 3
yearly_incidence.append(one_year.sum())
year.append(week2.year)
yearly_incidence = pd.Series(index=year, data = yearly_incidence)
```
```python
yearly_incidence.plot(style='*')
```
```python
yearly_incidence.sort_values()
```
```python
yearly_incidence.hist(xrot=20)
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment