Doing the chickenpox exercise.

1f57d1d1 · Tommy Rushton · 1d52aa63 · 1f57d1d1 · 1f57d1d1 · 1f57d1d1
Commit 1f57d1d1 authored Apr 24, 2024 by Tommy Rushton
Showing with 165 additions and 65 deletions

exercice_python_en.org module3/exo2/exercice_python_en.org +165 -65

incidence-zoom.png module3/exo2/incidence-zoom.png +0 -0

incidence.png module3/exo2/incidence.png +0 -0

No files found.
--- a/module3/exo2/exercice_python_en.org
+++ b/module3/exo2/exercice_python_en.org
-#+TITLE:  Your title
+#+TITLE:  Analysis of the incidence of chickenpox
-#+AUTHOR: Your name
+#+AUTHOR: Thomas Rushton
-#+DATE:   Today's date
+#+DATE:   2024-04-24
 #+LANGUAGE: en
-# #+PROPERTY: header-args :eval never-export
 #+HTML_HEAD: <link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/htmlize.css"/>
 #+HTML_HEAD: <link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/readtheorg.css"/>
@@ -11,84 +10,185 @@
 #+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
 #+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
-* Some explanations
+#+PROPERTY: header-args  :session
-This is an org-mode document with code examples in R.  Once opened in
+#+OPTIONS: ^:{}
-Emacs, this document can easily be exported to HTML, PDF, and Office
-formats. For more information on org-mode, see
-https://orgmode.org/guide/.
-When you type the shortcut =C-c C-e h o=, this document will be
+* Data acquisition
-exported as HTML. All the code in it will be re-executed, and the
-results will be retrieved and included into the exported document. If
-you do not want to re-execute all code each time, you can delete the #
-and the space before ~#+PROPERTY:~ in the header of this document.
-Like we showed in the video, Python code is included as follows (and
+Let's start by getting the data on chickenpox incidence from [[https://www.sentiweb.fr/][Rèseau Sentinelles]].
-is exxecuted by typing ~C-c C-c~):
+#+NAME: data-url
+https://www.sentiweb.fr/datasets/incidence-PAY-7.csv
+#+begin_src python :results output :var data_url=data-url
+data_file = "chickenpox.csv"
+import os
+import urllib.request
+if not os.path.exists(data_file):
+    urllib.request.urlretrieve(data_url, data_file)
+#+end_src
+#+RESULTS:
+** Format the data
+Discard the first line (which is a comment) and split the remaining
+lines into columns.
+#+begin_src python :results silent :exports both
+data = open(data_file, 'rb').read()
+lines = data.decode('latin-1').strip().split('\n')
+data_lines = lines[1:]
+table = [line.split(',') for line in data_lines]
+# Could use the csv library instead...
+#+end_src
+Let's take a gander at the data:
+#+begin_src python :results value :exports both
+table[:5]
+#+end_src
+#+RESULTS:
+|   week | indicator |   inc | inc_low | inc_up | inc100 | inc100_low | inc100_up | geo_insee | geo_name |
+| 202416 |         7 | 19330 |  13879 | 24781 |     29 |        21 |       37 | FR       | France  |
+| 202415 |         7 | 24807 |  17183 | 32431 |     37 |        26 |       48 | FR       | France  |
+| 202414 |         7 | 16181 |  12544 | 19818 |     24 |        19 |       29 | FR       | France  |
+| 202413 |         7 | 18322 |  14206 | 22438 |     27 |        21 |       33 | FR       | France  |
+Alright, looks good 👍
+** Sanitise the data
+But there may be problems with the data. Let's check for the obvious
+case of missing/empty data:
 #+begin_src python :results output :exports both
-print("Hello world!")
+valid_table = []
+for row in table:
+    missing = any([column == '' for column in row])
+    if missing:
+        print(row)
+    else:
+        valid_table.append(row)           
+#+end_src
+#+RESULTS:
+This is kind of grim, but it does the job.
+** Extract required columns
+For a temporal analysis, we just need the =week= and =inc= columns:
+#+begin_src python :results silent :exports both
+week = [row[0] for row in valid_table]
+assert week[0] == 'week'
+del week[0]
+inc = [row[2] for row in valid_table]
+assert inc[0] == 'inc'
+del inc[0]
+data = list(zip(week, inc))
+#+end_src
+Let's check what we have, using ~None~ to tell Org where to put
+separators in the resulting table (which contains the first five and
+last five weeks' data):
+#+begin_src python :results value :exports both
+[('week', 'inc'), None] + data[:5] + [None] + data[-5:]
 #+end_src
 #+RESULTS:
-: Hello world!
+|   week |   inc |
+|--------+-------|
+| 202416 | 19330 |
+| 202415 | 24807 |
+| 202414 | 16181 |
+| 202413 | 18322 |
+| 202412 | 12818 |
+|--------+-------|
+| 199101 | 15565 |
+| 199052 | 19375 |
+| 199051 | 19080 |
+| 199050 | 11079 |
+| 199049 |  1143 |
+** Convert dates
+Dates are represented in ISO 8601 format (YYYYWW) so let's parse
+those. It should already be sorted chronologically, but let's make
+sure of that too.
+#+begin_src python :results silent :exports both
+import datetime
+converted_data = [(datetime.datetime.strptime(year_and_week + ":1", '%G%V:%u').date(),
+                   int(inc))
+                  for year_and_week, inc in data]
+converted_data.sort(key = lambda record: record[0])
+#+end_src
-And now the same but in an Python session. With a session, Python's
+Let's check again:
-state, i.e. the values of all the variables, remains persistent from
-one code block to the next. The code is still executed using ~C-c
-C-c~.
-#+begin_src python :results output :session :exports both
+#+begin_src python :results value :exports both
-import numpy
+data_as_str = [(str(date), str(inc)) for date, inc in converted_data]
-x=numpy.linspace(-15,15)
+[('date', 'inc'), None] + data_as_str[:5] + [None] + data_as_str[-5:]
-print(x)
 #+end_src
 #+RESULTS:
-#+begin_example
+|       date |   inc |
-[-15.         -14.3877551  -13.7755102  -13.16326531 -12.55102041
+|------------+-------|
- -11.93877551 -11.32653061 -10.71428571 -10.10204082  -9.48979592
+| 1990-12-03 |  1143 |
-  -8.87755102  -8.26530612  -7.65306122  -7.04081633  -6.42857143
+| 1990-12-10 | 11079 |
-  -5.81632653  -5.20408163  -4.59183673  -3.97959184  -3.36734694
+| 1990-12-17 | 19080 |
-  -2.75510204  -2.14285714  -1.53061224  -0.91836735  -0.30612245
+| 1990-12-24 | 19375 |
-   0.30612245   0.91836735   1.53061224   2.14285714   2.75510204
+| 1990-12-31 | 15565 |
-   3.36734694   3.97959184   4.59183673   5.20408163   5.81632653
+|------------+-------|
-   6.42857143   7.04081633   7.65306122   8.26530612   8.87755102
+| 2024-03-18 | 12818 |
-   9.48979592  10.10204082  10.71428571  11.32653061  11.93877551
+| 2024-03-25 | 18322 |
-  12.55102041  13.16326531  13.7755102   14.3877551   15.        ]
+| 2024-04-01 | 16181 |
-#+end_example
+| 2024-04-08 | 24807 |
+| 2024-04-15 | 19330 |
-Finally, an example for graphical output:
-#+begin_src python :results output file :session :var matplot_lib_filename="./cosxsx.png" :exports results
+** Visual inspection
+So, now we can take a look at incidence over time. (The 'flu notebook
+switches to R here, but we're going to stick with python.)
+#+begin_src python :results output file :var filename="./incidence.png" :exports both
 import matplotlib.pyplot as plt
-plt.figure(figsize=(10,5))
+plt.clf()
-plt.plot(x,numpy.cos(x)/x)
+date,incidence = zip(*converted_data)
+plt.plot(date,incidence)
 plt.tight_layout()
+plt.savefig(filename)
+print(filename)
+#+end_src
+#+RESULTS:
+[[file:./incidence.png]]
+And we can zoom in on a period of, say, five years:
-plt.savefig(matplot_lib_filename)
+#+begin_src python :results output file :var filename="./incidence-zoom.png" :exports both
-print(matplot_lib_filename)
+plt.clf()
+start = 10
+years = 5
+date,incidence = zip(*converted_data[52*start:52*(start+years)])
+plt.plot(date,incidence)
+plt.tight_layout()
+plt.savefig(filename)
+print(filename)
 #+end_src
 #+RESULTS:
-[[file:./cosxsx.png]]
+[[file:./incidence-zoom.png]]
-Note the parameter ~:exports results~, which indicates that the code
+It looks like incidence peaks in the spring, with lowest numbers
-will not appear in the exported document. We recommend that in the
+around September.
-context of this MOOC, you always leave this parameter setting as
-~:exports both~, because we want your analyses to be perfectly
-transparent and reproducible.
-Watch out: the figure generated by the code block is /not/ stored in
-the org document. It's a plain file, here named ~cosxsx.png~. You have
-to commit it explicitly if you want your analysis to be legible and
-understandable on GitLab.
-Finally, don't forget that we provide in the resource section of this
-MOOC a configuration with a few keyboard shortcuts that allow you to
-quickly create code blocks in Python by typing ~<p~, ~<P~ or ~<PP~
-followed by ~Tab~.
-Now it's your turn! You can delete all this information and replace it
-by your computational document.
--- a/module3/exo2/incidence-zoom.png
+++ b/module3/exo2/incidence-zoom.png
--- a/module3/exo2/incidence.png
+++ b/module3/exo2/incidence.png