From 78b7488edb025532788f18b713a409f3129e5a1b Mon Sep 17 00:00:00 2001 From: Dorinel Bastide Date: Wed, 15 Jul 2020 22:21:01 +0200 Subject: [PATCH] Commit of exo1 module 3 to start correcting --- module3/exo1/influenza-like-illness-analysis.org | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/module3/exo1/influenza-like-illness-analysis.org b/module3/exo1/influenza-like-illness-analysis.org index 6c8b47a..0102ba8 100644 --- a/module3/exo1/influenza-like-illness-analysis.org +++ b/module3/exo1/influenza-like-illness-analysis.org @@ -45,6 +45,9 @@ The data on the incidence of influenza-like illness are available from the Web s #+NAME: data-url http://www.sentiweb.fr/datasets/incidence-PAY-3.csv +#+NAME: data-csv +~/org/incidence-PAY-3.csv + This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/csv-schema-v1.json][the download site]]: | Column name | Description | @@ -65,10 +68,12 @@ The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Eu ** Download After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns. -#+BEGIN_SRC python :results silent :var data_url=data-url +#+BEGIN_SRC python :results silent :var data_csv=data-csv from urllib.request import urlopen - -data = urlopen(data_url).read() +import csv +#data = urlopen(data_url).read() +with open(data_csv) as csv_file: + data = csv.DictReader(csv_file) lines = data.decode('latin-1').strip().split('\n') data_lines = lines[1:] table = [line.split(',') for line in data_lines] @@ -79,6 +84,8 @@ Let's have a look at what we have so far: table[:5] #+END_SRC +#+RESULTS: + ** Checking for missing data Unfortunately there are many ways to indicate the absence of a data value in a dataset. Here we check for a common one: empty fields. For completeness, we should also look for non-numerical data in numerical columns. We don't do this here, but checks in later processing steps would catch such anomalies. -- 2.18.1