Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
M
mooc-rr
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
ca693b8bd1eb5f2789da9233f56f5d25
mooc-rr
Commits
1f57d1d1
Commit
1f57d1d1
authored
Apr 24, 2024
by
Tommy Rushton
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Doing the chickenpox exercise.
parent
1d52aa63
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
165 additions
and
65 deletions
+165
-65
exercice_python_en.org
module3/exo2/exercice_python_en.org
+165
-65
incidence-zoom.png
module3/exo2/incidence-zoom.png
+0
-0
incidence.png
module3/exo2/incidence.png
+0
-0
No files found.
module3/exo2/exercice_python_en.org
View file @
1f57d1d1
#+TITLE:
Your title
#+AUTHOR:
Your name
#+DATE:
Today's date
#+TITLE:
Analysis of the incidence of chickenpox
#+AUTHOR:
Thomas Rushton
#+DATE:
2024-04-24
#+LANGUAGE: en
# #+PROPERTY: header-args :eval never-export
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/htmlize.css"/>
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/readtheorg.css"/>
...
...
@@ -11,84 +10,185 @@
#+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
#+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
* Some explanations
#+PROPERTY: header-args :session
This is an org-mode document with code examples in R. Once opened in
Emacs, this document can easily be exported to HTML, PDF, and Office
formats. For more information on org-mode, see
https://orgmode.org/guide/.
#+OPTIONS: ^:{}
When you type the shortcut =C-c C-e h o=, this document will be
exported as HTML. All the code in it will be re-executed, and the
results will be retrieved and included into the exported document. If
you do not want to re-execute all code each time, you can delete the #
and the space before ~#+PROPERTY:~ in the header of this document.
* Data acquisition
Like we showed in the video, Python code is included as follows (and
is exxecuted by typing ~C-c C-c~):
Let's start by getting the data on chickenpox incidence from [[https://www.sentiweb.fr/][Rèseau Sentinelles]].
#+NAME: data-url
https://www.sentiweb.fr/datasets/incidence-PAY-7.csv
#+begin_src python :results output :var data_url=data-url
data_file = "chickenpox.csv"
import os
import urllib.request
if not os.path.exists(data_file):
urllib.request.urlretrieve(data_url, data_file)
#+end_src
#+RESULTS:
** Format the data
Discard the first line (which is a comment) and split the remaining
lines into columns.
#+begin_src python :results silent :exports both
data = open(data_file, 'rb').read()
lines = data.decode('latin-1').strip().split('\n')
data_lines = lines[1:]
table = [line.split(',') for line in data_lines]
# Could use the csv library instead...
#+end_src
Let's take a gander at the data:
#+begin_src python :results value :exports both
table[:5]
#+end_src
#+RESULTS:
| week | indicator | inc | inc_low | inc_up | inc100 | inc100_low | inc100_up | geo_insee | geo_name |
| 202416 | 7 | 19330 | 13879 | 24781 | 29 | 21 | 37 | FR | France |
| 202415 | 7 | 24807 | 17183 | 32431 | 37 | 26 | 48 | FR | France |
| 202414 | 7 | 16181 | 12544 | 19818 | 24 | 19 | 29 | FR | France |
| 202413 | 7 | 18322 | 14206 | 22438 | 27 | 21 | 33 | FR | France |
Alright, looks good 👍
** Sanitise the data
But there may be problems with the data. Let's check for the obvious
case of missing/empty data:
#+begin_src python :results output :exports both
print("Hello world!")
valid_table = []
for row in table:
missing = any([column == '' for column in row])
if missing:
print(row)
else:
valid_table.append(row)
#+end_src
#+RESULTS:
This is kind of grim, but it does the job.
** Extract required columns
For a temporal analysis, we just need the =week= and =inc= columns:
#+begin_src python :results silent :exports both
week = [row[0] for row in valid_table]
assert week[0] == 'week'
del week[0]
inc = [row[2] for row in valid_table]
assert inc[0] == 'inc'
del inc[0]
data = list(zip(week, inc))
#+end_src
Let's check what we have, using ~None~ to tell Org where to put
separators in the resulting table (which contains the first five and
last five weeks' data):
#+begin_src python :results value :exports both
[('week', 'inc'), None] + data[:5] + [None] + data[-5:]
#+end_src
#+RESULTS:
: Hello world!
| week | inc |
|--------+-------|
| 202416 | 19330 |
| 202415 | 24807 |
| 202414 | 16181 |
| 202413 | 18322 |
| 202412 | 12818 |
|--------+-------|
| 199101 | 15565 |
| 199052 | 19375 |
| 199051 | 19080 |
| 199050 | 11079 |
| 199049 | 1143 |
** Convert dates
Dates are represented in ISO 8601 format (YYYYWW) so let's parse
those. It should already be sorted chronologically, but let's make
sure of that too.
#+begin_src python :results silent :exports both
import datetime
converted_data = [(datetime.datetime.strptime(year_and_week + ":1", '%G%V:%u').date(),
int(inc))
for year_and_week, inc in data]
converted_data.sort(key = lambda record: record[0])
#+end_src
And now the same but in an Python session. With a session, Python's
state, i.e. the values of all the variables, remains persistent from
one code block to the next. The code is still executed using ~C-c
C-c~.
Let's check again:
#+begin_src python :results output :session :exports both
import numpy
x=numpy.linspace(-15,15)
print(x)
#+begin_src python :results value :exports both
data_as_str = [(str(date), str(inc)) for date, inc in converted_data]
[('date', 'inc'), None] + data_as_str[:5] + [None] + data_as_str[-5:]
#+end_src
#+RESULTS:
#+begin_example
[-15. -14.3877551 -13.7755102 -13.16326531 -12.55102041
-11.93877551 -11.32653061 -10.71428571 -10.10204082 -9.48979592
-8.87755102 -8.26530612 -7.65306122 -7.04081633 -6.42857143
-5.81632653 -5.20408163 -4.59183673 -3.97959184 -3.36734694
-2.75510204 -2.14285714 -1.53061224 -0.91836735 -0.30612245
0.30612245 0.91836735 1.53061224 2.14285714 2.75510204
3.36734694 3.97959184 4.59183673 5.20408163 5.81632653
6.42857143 7.04081633 7.65306122 8.26530612 8.87755102
9.48979592 10.10204082 10.71428571 11.32653061 11.93877551
12.55102041 13.16326531 13.7755102 14.3877551 15. ]
#+end_example
Finally, an example for graphical output:
#+begin_src python :results output file :session :var matplot_lib_filename="./cosxsx.png" :exports results
| date | inc |
|------------+-------|
| 1990-12-03 | 1143 |
| 1990-12-10 | 11079 |
| 1990-12-17 | 19080 |
| 1990-12-24 | 19375 |
| 1990-12-31 | 15565 |
|------------+-------|
| 2024-03-18 | 12818 |
| 2024-03-25 | 18322 |
| 2024-04-01 | 16181 |
| 2024-04-08 | 24807 |
| 2024-04-15 | 19330 |
** Visual inspection
So, now we can take a look at incidence over time. (The 'flu notebook
switches to R here, but we're going to stick with python.)
#+begin_src python :results output file :var filename="./incidence.png" :exports both
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.plot(x,numpy.cos(x)/x)
plt.clf()
date,incidence = zip(*converted_data)
plt.plot(date,incidence)
plt.tight_layout()
plt.savefig(filename)
print(filename)
#+end_src
#+RESULTS:
[[file:./incidence.png]]
And we can zoom in on a period of, say, five years:
plt.savefig(matplot_lib_filename)
print(matplot_lib_filename)
#+begin_src python :results output file :var filename="./incidence-zoom.png" :exports both
plt.clf()
start = 10
years = 5
date,incidence = zip(*converted_data[52*start:52*(start+years)])
plt.plot(date,incidence)
plt.tight_layout()
plt.savefig(filename)
print(filename)
#+end_src
#+RESULTS:
[[file:./cosxsx.png]]
Note the parameter ~:exports results~, which indicates that the code
will not appear in the exported document. We recommend that in the
context of this MOOC, you always leave this parameter setting as
~:exports both~, because we want your analyses to be perfectly
transparent and reproducible.
Watch out: the figure generated by the code block is /not/ stored in
the org document. It's a plain file, here named ~cosxsx.png~. You have
to commit it explicitly if you want your analysis to be legible and
understandable on GitLab.
Finally, don't forget that we provide in the resource section of this
MOOC a configuration with a few keyboard shortcuts that allow you to
quickly create code blocks in Python by typing ~<p~, ~<P~ or ~<PP~
followed by ~Tab~.
Now it's your turn! You can delete all this information and replace it
by your computational document.
[[file:./incidence-zoom.png]]
It looks like incidence peaks in the spring, with lowest numbers
around September.
module3/exo2/incidence-zoom.png
0 → 100644
View file @
1f57d1d1
54.1 KB
module3/exo2/incidence.png
0 → 100644
View file @
1f57d1d1
62.3 KB
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment