Commit 78b7488e authored by Dorinel Bastide's avatar Dorinel Bastide

Commit of exo1 module 3 to start correcting

parent 496f580a
...@@ -45,6 +45,9 @@ The data on the incidence of influenza-like illness are available from the Web s ...@@ -45,6 +45,9 @@ The data on the incidence of influenza-like illness are available from the Web s
#+NAME: data-url #+NAME: data-url
http://www.sentiweb.fr/datasets/incidence-PAY-3.csv http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
#+NAME: data-csv
~/org/incidence-PAY-3.csv
This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/csv-schema-v1.json][the download site]]: This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/csv-schema-v1.json][the download site]]:
| Column name | Description | | Column name | Description |
...@@ -65,10 +68,12 @@ The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Eu ...@@ -65,10 +68,12 @@ The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Eu
** Download ** Download
After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns. After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns.
#+BEGIN_SRC python :results silent :var data_url=data-url #+BEGIN_SRC python :results silent :var data_csv=data-csv
from urllib.request import urlopen from urllib.request import urlopen
import csv
data = urlopen(data_url).read() #data = urlopen(data_url).read()
with open(data_csv) as csv_file:
data = csv.DictReader(csv_file)
lines = data.decode('latin-1').strip().split('\n') lines = data.decode('latin-1').strip().split('\n')
data_lines = lines[1:] data_lines = lines[1:]
table = [line.split(',') for line in data_lines] table = [line.split(',') for line in data_lines]
...@@ -79,6 +84,8 @@ Let's have a look at what we have so far: ...@@ -79,6 +84,8 @@ Let's have a look at what we have so far:
table[:5] table[:5]
#+END_SRC #+END_SRC
#+RESULTS:
** Checking for missing data ** Checking for missing data
Unfortunately there are many ways to indicate the absence of a data value in a dataset. Here we check for a common one: empty fields. For completeness, we should also look for non-numerical data in numerical columns. We don't do this here, but checks in later processing steps would catch such anomalies. Unfortunately there are many ways to indicate the absence of a data value in a dataset. Here we check for a common one: empty fields. For completeness, we should also look for non-numerical data in numerical columns. We don't do this here, but checks in later processing steps would catch such anomalies.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment