Reading data from input file

949ff85f · Miguel Felipe Silva Vasconcelos · 845a7d14 · 949ff85f
Commit 949ff85f authored Mar 05, 2021 by Miguel Felipe Silva Vasconcelos
Hide whitespace changes
Inline Side-by-side

Showing with 45 additions and 3 deletions

influenza-like-illness-analysis.org module3/exo1/influenza-like-illness-analysis.org +45 -3

No files found.
--- a/module3/exo1/influenza-like-illness-analysis.org
+++ b/module3/exo1/influenza-like-illness-analysis.org
@@ -45,6 +45,9 @@ The data on the incidence of influenza-like illness are available from the Web s
 #+NAME: data-url
 http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
+#+NAME: file-name
+influenzaincidence.csv
 This is the documentation of the data from [[https://ns.sentiweb.fr/incidence/csv-schema-v1.json][the download site]]:
 | Column name  | Description                                                                                                               |
@@ -65,20 +68,56 @@ The [[https://en.wikipedia.org/wiki/ISO_8601][ISO-8601]] format is popular in Eu
 ** Download
 After downloading the raw data, we extract the part we are interested in. We first split the file into lines, of which we discard the first one that contains a comment. We then split the remaining lines into columns.
-#+BEGIN_SRC python :results silent :var data_url=data-url
+#+BEGIN_SRC python :results output :var data_url=data-url :var file_name=file-name
 from urllib.request import urlopen
+import shutil
+import os.path
+def downloadFile():
+    result =  urlopen(data_url) #makes the requisition for the file
+    out_file = open(file_name, 'wb') #tries save it to a file named influenzaincidence.csv
+    shutil.copyfileobj(result, out_file) #use shutil.copyfileobj if the file is large. See https://docs.python.org/dev/library/shutil.html#shutil.copyfileobj
+    result = result.read()
+    out_file.close() #close the file after downloading it
+    print("File downloaded!")    
+    return result
+def loadData():
+    if os.path.isfile(file_name):
+        print("File Exists!")
+    else:
+        print("File not available locally... Trying to download it:")
+        downloadFile()        
+    file = open(file_name,"rb") #tries to open the file
+    data = file.read()
+    file.close()
+    return data
+data = loadData()
-data = urlopen(data_url).read()
 lines = data.decode('latin-1').strip().split('\n')
 data_lines = lines[1:]
 table = [line.split(',') for line in data_lines]
 #+END_SRC
+#+RESULTS:
+: File not available locally... Trying to download it:
+: File downloaded!
 Let's have a look at what we have so far:
 #+BEGIN_SRC python :results value
-table[:5]
+table[:2]
 #+END_SRC
+#+RESULTS:
+|   week | indicator |   inc | inc_low | inc_up | inc100 | inc100_low | inc100_up | geo_insee | geo_name |
+| 202108 |         3 | 27492 |  22140 | 32844 |     42 |        34 |       50 | FR       | France  |
 ** Checking for missing data
 Unfortunately there are many ways to indicate the absence of a data value in a dataset. Here we check for a common one: empty fields. For completeness, we should also look for non-numerical data in numerical columns. We don't do this here, but checks in later processing steps would catch such anomalies.
@@ -93,6 +132,9 @@ for row in table:
        valid_table.append(row)
 #+END_SRC
+#+RESULTS:
+: ['198919', '3', '0', '', '', '0', '', '', 'FR', 'France']
 ** Extraction of the required columns
 There are only two columns that we will need for our analysis: the first (~"week"~) and the third (~"inc"~). We check the names in the header to be sure we pick the right data. We make a new table containing just the two columns required, without the header.
 #+BEGIN_SRC python :results silent