Commit of exo1 module 3 to start correcting

78b7488e · Dorinel Bastide · 496f580a · 78b7488e
Commit 78b7488e authored Jul 15, 2020 by Dorinel Bastide
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 3 deletions

influenza-like-illness-analysis.org module3/exo1/influenza-like-illness-analysis.org +10 -3

No files found.
--- a/module3/exo1/influenza-like-illness-analysis.org
+++ b/module3/exo1/influenza-like-illness-analysis.org
@@ -45,6 +45,9 @@ The data on the incidence of influenza-like illness are available from the Web s
 #+NAME: data-url
 http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
+#+NAME: data-csv
+~/org/incidence-PAY-3.csv
 This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/csv-schema-v1.json][the download site]]:
 | Column name  | Description                                                                                                               |
@@ -65,10 +68,12 @@ The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Eu
 ** Download
 After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns.
-#+BEGIN_SRC python :results silent :var data_url=data-url
+#+BEGIN_SRC python :results silent :var data_csv=data-csv
 from urllib.request import urlopen
+import csv
-data = urlopen(data_url).read()
+#data = urlopen(data_url).read()
+with open(data_csv) as csv_file:
+    data = csv.DictReader(csv_file) 
 lines = data.decode('latin-1').strip().split('\n')
 data_lines = lines[1:]
 table = [line.split(',') for line in data_lines]
@@ -79,6 +84,8 @@ Let's have a look at what we have so far:
 table[:5]
 #+END_SRC
+#+RESULTS:
 ** Checking for missing data
 Unfortunately there are many ways to indicate the absence of a data value in a dataset. Here we check for a common one: empty fields. For completeness, we should also look for non-numerical data in numerical columns. We don't do this here, but checks in later processing steps would catch such anomalies.