diff --git a/module3/exo1/influenza-like-illness-analysis.org b/module3/exo1/influenza-like-illness-analysis.org index 0102ba8e57bddbea17eb487623dec3510952ff44..a37cd03e66e1c6a96fdb40d144424cde10e3a9b9 100644 --- a/module3/exo1/influenza-like-illness-analysis.org +++ b/module3/exo1/influenza-like-illness-analysis.org @@ -66,14 +66,23 @@ This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/cs The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Europe, but less so in North America. This may explain why few software packages handle this format. The Python language does it since version 3.6. We therefore use Python for the pre-processing phase, which has the advantage of not requiring any additional library. (Note: we will explain in module 4 why it is desirable for reproducibility to use as few external libraries as possible.) ** Download -After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns. - -#+BEGIN_SRC python :results silent :var data_csv=data-csv -from urllib.request import urlopen -import csv -#data = urlopen(data_url).read() -with open(data_csv) as csv_file: - data = csv.DictReader(csv_file) +After downloading the raw data, we extract the part we are interested +in. We first split the file into lines, of which we discard the first +one that contains a comment. We then split the remaining lines into +columns. + +* Loading from Local +Instead of relying on url, the data file is loaded from local path +#+BEGIN_SRC python :results silent :var data_url=data-url +data_file = "syndrom-grippal.csv" +import os +import urllib.request +if not os.path.exists(data_file): + urllib.request.urlretrieve(data_url, data_file) +#+END_SRC + +#+BEGIN_SRC python :results silent :var data_url=data-url +data = urlopen(data_url).read() lines = data.decode('latin-1').strip().split('\n') data_lines = lines[1:] table = [line.split(',') for line in data_lines]