Commit 78b7488e authored by Dorinel Bastide's avatar Dorinel Bastide

Commit of exo1 module 3 to start correcting

parent 496f580a
......@@ -45,6 +45,9 @@ The data on the incidence of influenza-like illness are available from the Web s
#+NAME: data-url
http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
#+NAME: data-csv
~/org/incidence-PAY-3.csv
This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/csv-schema-v1.json][the download site]]:
| Column name | Description |
......@@ -65,10 +68,12 @@ The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Eu
** Download
After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns.
#+BEGIN_SRC python :results silent :var data_url=data-url
#+BEGIN_SRC python :results silent :var data_csv=data-csv
from urllib.request import urlopen
data = urlopen(data_url).read()
import csv
#data = urlopen(data_url).read()
with open(data_csv) as csv_file:
data = csv.DictReader(csv_file)
lines = data.decode('latin-1').strip().split('\n')
data_lines = lines[1:]
table = [line.split(',') for line in data_lines]
......@@ -79,6 +84,8 @@ Let's have a look at what we have so far:
table[:5]
#+END_SRC
#+RESULTS:
** Checking for missing data
Unfortunately there are many ways to indicate the absence of a data value in a dataset. Here we check for a common one: empty fields. For completeness, we should also look for non-numerical data in numerical columns. We don't do this here, but checks in later processing steps would catch such anomalies.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment