Commit ddcc58d9 authored by Dorinel Bastide's avatar Dorinel Bastide

Second commit of the exercise

parent 78b7488e
......@@ -66,14 +66,23 @@ This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/cs
The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Europe, but less so in North America. This may explain why few software packages handle this format. The Python language does it since version 3.6. We therefore use Python for the pre-processing phase, which has the advantage of not requiring any additional library. (Note: we will explain in module 4 why it is desirable for reproducibility to use as few external libraries as possible.)
** Download
After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns.
#+BEGIN_SRC python :results silent :var data_csv=data-csv
from urllib.request import urlopen
import csv
#data = urlopen(data_url).read()
with open(data_csv) as csv_file:
data = csv.DictReader(csv_file)
After downloading the raw data, we extract the part we are interested
in. We first split the file into lines, of which we discard the first
one that contains a comment. We then split the remaining lines into
columns.
* Loading from Local
Instead of relying on url, the data file is loaded from local path
#+BEGIN_SRC python :results silent :var data_url=data-url
data_file = "syndrom-grippal.csv"
import os
import urllib.request
if not os.path.exists(data_file):
urllib.request.urlretrieve(data_url, data_file)
#+END_SRC
#+BEGIN_SRC python :results silent :var data_url=data-url
data = urlopen(data_url).read()
lines = data.decode('latin-1').strip().split('\n')
data_lines = lines[1:]
table = [line.split(',') for line in data_lines]
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment