Commit 0be13543 authored by Jamal KHAN's avatar Jamal KHAN

Exercise 3 Part 2

parent 09ae2950
This diff is collapsed.
#+TITLE: Your title
#+AUTHOR: Your name
#+DATE: Today's date
#+TITLE: Analysis of the incidence of chickenpox
#+AUTHOR: Jamal KHAN
#+DATE: 2020-09-16
#+LANGUAGE: en
# #+PROPERTY: header-args :eval never-export
# #+PROPERTY: header-args :session *python* :exports both
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/htmlize.css"/>
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/readtheorg.css"/>
......@@ -11,84 +11,106 @@
#+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
#+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
* Some explanations
* Data download
#+NAME: data-url
https://www.sentiweb.fr/datasets/incidence-PAY-7.csv
This is an org-mode document with code examples in R. Once opened in
Emacs, this document can easily be exported to HTML, PDF, and Office
formats. For more information on org-mode, see
https://orgmode.org/guide/.
#+BEGIN_SRC python :session *python* :results output :var data_url=data-url
data_file = 'chickenpox_incidence.csv'
When you type the shortcut =C-c C-e h o=, this document will be
exported as HTML. All the code in it will be re-executed, and the
results will be retrieved and included into the exported document. If
you do not want to re-execute all code each time, you can delete the #
and the space before ~#+PROPERTY:~ in the header of this document.
import datetime
from urllib.request import urlretrieve
import os
Like we showed in the video, Python code is included as follows (and
is exxecuted by typing ~C-c C-c~):
if not os.path.exists(data_file):
urlretrieve(data_url, data_file)
#+begin_src python :results output :exports both
print("Hello world!")
#+end_src
print(f'Data is retrieved at {datetime.datetime.utcnow()} UTC')
#+END_SRC
#+RESULTS:
: Hello world!
:
: Data is retrieved at 2020-09-16 22:07:31.650075 UTC
And now the same but in an Python session. With a session, Python's
state, i.e. the values of all the variables, remains persistent from
one code block to the next. The code is still executed using ~C-c
C-c~.
Now we extract the interesting part of the data. from the format of the file The week is column 0, incidence is column 4. We took Monday as the first day of the week so '%W' code in python/pandas.
#+begin_src python :results output :session :exports both
import numpy
x=numpy.linspace(-15,15)
print(x)
#+end_src
#+BEGIN_SRC python :session *python* :results outputs :export both
import pandas as pd
data = pd.read_csv(data_file, skiprows=2, header=None)
data = data.loc[:, [0, 2]].rename(columns={0:'Datetime', 2:'Incidence'})
data.Datetime = pd.to_datetime(data.Datetime*10+1, format='%Y%W%w')
data = data.sort_values(by='Datetime')
data = data.set_index('Datetime')
data.describe()
#+END_SRC
#+RESULTS:
#+begin_example
[-15. -14.3877551 -13.7755102 -13.16326531 -12.55102041
-11.93877551 -11.32653061 -10.71428571 -10.10204082 -9.48979592
-8.87755102 -8.26530612 -7.65306122 -7.04081633 -6.42857143
-5.81632653 -5.20408163 -4.59183673 -3.97959184 -3.36734694
-2.75510204 -2.14285714 -1.53061224 -0.91836735 -0.30612245
0.30612245 0.91836735 1.53061224 2.14285714 2.75510204
3.36734694 3.97959184 4.59183673 5.20408163 5.81632653
6.42857143 7.04081633 7.65306122 8.26530612 8.87755102
9.48979592 10.10204082 10.71428571 11.32653061 11.93877551
12.55102041 13.16326531 13.7755102 14.3877551 15. ]
#+end_example
Finally, an example for graphical output:
#+begin_src python :results output file :session :var matplot_lib_filename="./cosxsx.png" :exports results
: Incidence
: count 1554.000000
: mean 12647.119691
: std 6657.542827
: min 161.000000
: 25% 7326.750000
: 50% 12627.000000
: 75% 17155.000000
: max 36298.000000
Now check for missing data. Pandas automatically handles the missing data, so we will check for na value in the dataframe.
#+BEGIN_SRC python :session *python* :results outputs :export both
data.is_na()
#+END_SRC
#+RESULTS:
Looks ok. Now time to plot a timeseries of the chicken pox incidence.
#+BEGIN_SRC python :session *python* :results output file :var ts_plot="chickepox_timeseries.png" :export file
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 3))
data.plot(ax=ax)
plt.savefig(ts_plot)
print(ts_plot)
#+END_SRC
#+RESULTS:
[[file:chickepox_timeseries.png]]
Additionally, the data starts at the beginning of 1991 and ends at the beginning of the 2020. The monthly evolution is not clear from the long timeseries. Plotting a shorter version.
#+BEGIN_SRC python :session *python* :results output file :var ts_plot="chickepox_timeseries_short.png" :export file
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 3))
data['1991-01-01':'1994-01-01'].plot(ax=ax)
plt.savefig(ts_plot)
print(ts_plot)
#+END_SRC
#+RESULTS:
[[file:chickepox_timeseries_short.png]]
It appears that the dip is in november. So I need to group the yearly data starting from November.
#+BEGIN_SRC python :session *python* :results output file :var ts_plot="chickepox_timeseries_yearly.png" :export both
data_yearly = data.groupby(data.index.shift(8, freq='m').year).sum()
data_yearly = data_yearly.iloc[1:-2]
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 3))
data_yearly.plot(ax=ax)
plt.savefig(ts_plot)
print(data_yearly.sort_values(by='Incidence'))
print(ts_plot)
#+END_SRC
plt.figure(figsize=(10,5))
plt.plot(x,numpy.cos(x)/x)
plt.tight_layout()
#+RESULTS:
plt.savefig(matplot_lib_filename)
print(matplot_lib_filename)
#+end_src
Plot of the aggregated values.
#+BEGIN_SRC python :session *python* :results output file :var ts_plot="chickepox_timeseries_yearly.png" :export both
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 3))
data_yearly.plot(ax=ax)
plt.savefig(ts_plot)
print(ts_plot)
#+END_SRC
#+RESULTS:
[[file:./cosxsx.png]]
Note the parameter ~:exports results~, which indicates that the code
will not appear in the exported document. We recommend that in the
context of this MOOC, you always leave this parameter setting as
~:exports both~, because we want your analyses to be perfectly
transparent and reproducible.
Watch out: the figure generated by the code block is /not/ stored in
the org document. It's a plain file, here named ~cosxsx.png~. You have
to commit it explicitly if you want your analysis to be legible and
understandable on GitLab.
Finally, don't forget that we provide in the resource section of this
MOOC a configuration with a few keyboard shortcuts that allow you to
quickly create code blocks in Python by typing ~<p~, ~<P~ or ~<PP~
followed by ~Tab~.
Now it's your turn! You can delete all this information and replace it
by your computational document.
[[file:chickepox_timeseries_yearly.png]]
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment