#+TITLE: Module 3 - exercice 2 #+AUTHOR: Votre nom #+DATE: La date du jour #+LANGUAGE: fr # #+PROPERTY: header-args :eval never-export #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: * Import des données, vérification, préparation ** Import et vérification rapide Les données au format csv sont téléchargées depuis l'adresse : http://www.sentiweb.fr/datasets/incidence-PAY-7.csv Code source adapté de : #+NAME: code-source-mooc https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/blob/master/module3/ressources/analyse-syndrome-grippal-orgmode.org #+begin_src python :results output :session :exports both from urllib.request import urlretrieve import os if os.path.isfile("incidence-PAY-7.csv"): print("Chargement du fichier local") else: urlretrieve("http://www.sentiweb.fr/datasets/incidence-PAY-7.csv", "incidence-PAY-7.csv") data = open("incidence-PAY-7.csv", encoding="iso-8859-1").read() lines = data.strip().split('\n') data_lines = lines[1:] table = [line.split(',') for line in data_lines] #+end_src #+RESULTS: : Chargement du fichier local Vérification visuelle des premières lignes. #+begin_src python :results value :session :exports both table[:5] #+end_src #+RESULTS: | week | indicator | inc | inc_low | inc_up | inc100 | inc100_low | inc100_up | geo_insee | geo_name | | 202016 | 7 | 803 | 83 | 1523 | 1 | 0 | 2 | FR | France | | 202015 | 7 | 1918 | 675 | 3161 | 3 | 1 | 5 | FR | France | | 202014 | 7 | 3879 | 2227 | 5531 | 6 | 3 | 9 | FR | France | | 202013 | 7 | 7326 | 5236 | 9416 | 11 | 8 | 14 | FR | France | ** Extraction des colonnes utilisées #+begin_src python :results output :session :exports both week = [row[0] for row in table] assert week[0] == 'week' del week[0] inc = [row[2] for row in table] assert inc[0] == 'inc' del inc[0] data = list(zip(week, inc)) #+end_src #+RESULTS: Vérification visuelle des premières et dernières lignes. #+begin_src python :results value :session :exports both [('week', 'inc'), None] + data[:5] + [None] + data[-5:] #+end_src #+RESULTS: | week | inc | |--------+-------| | 202016 | 803 | | 202015 | 1918 | | 202014 | 3879 | | 202013 | 7326 | | 202012 | 8123 | |--------+-------| | 199101 | 15565 | | 199052 | 19375 | | 199051 | 19080 | | 199050 | 11079 | | 199049 | 1143 | ** Conversion des dates Code source du mooc. #+begin_src python :results output :session :exports both import datetime converted_data = [(datetime.datetime.strptime(year_and_week + ":1" , '%G%V:%u').date(), int(inc)) for year_and_week, inc in data] converted_data.sort(key = lambda record: record[0]) #+end_src #+RESULTS: Visualisation premières lignes. Code source tjrs repris de [[code-source-mooc]] #+begin_src python :results value :session :exports both str_data = [(str(date), str(inc)) for date, inc in converted_data] [('date', 'inc'), None] + str_data[:5] + [None] + str_data[-5:] #+end_src #+RESULTS: | date | inc | |------------+-------| | 1990-12-03 | 1143 | | 1990-12-10 | 11079 | | 1990-12-17 | 19080 | | 1990-12-24 | 19375 | | 1990-12-31 | 15565 | |------------+-------| | 2020-03-16 | 8123 | | 2020-03-23 | 7326 | | 2020-03-30 | 3879 | | 2020-04-06 | 1918 | | 2020-04-13 | 803 | Vérification des dates : voyons si il y a des données manquantes, i.e. plus de 7 jours d'écart entre deux dates. #+begin_src python :results output :session :exports both dates = [date for date, _ in converted_data] for date1, date2 in zip(dates[:-1], dates[1:]): if date2-date1 != datetime.timedelta(weeks=1): print(f"Il y a {date2-date1} entre {date1} et {date2}") #+end_src #+RESULTS: Visiblement aucune ligne manquante.