Commit 6b0532c0 authored by Tommy Rushton's avatar Tommy Rushton

Finish 3-2.

parent 1f57d1d1
......@@ -61,8 +61,8 @@ Alright, looks good 👍
** Sanitise the data
But there may be problems with the data. Let's check for the obvious
case of missing/empty data:
But the data may not be without its issues. Let's check for the
obvious case of missing/empty data:
#+begin_src python :results output :exports both
valid_table = []
......@@ -97,18 +97,18 @@ separators in the resulting table (which contains the first five and
last five weeks' data):
#+begin_src python :results value :exports both
[('week', 'inc'), None] + data[:5] + [None] + data[-5:]
[('week', 'incidence'), None] + data[:5] + [None] + data[-5:]
#+end_src
#+RESULTS:
| week | inc |
|--------+-------|
| week | incidence |
|--------+-----------|
| 202416 | 19330 |
| 202415 | 24807 |
| 202414 | 16181 |
| 202413 | 18322 |
| 202412 | 12818 |
|--------+-------|
|--------+-----------|
| 199101 | 15565 |
| 199052 | 19375 |
| 199051 | 19080 |
......@@ -118,8 +118,8 @@ last five weeks' data):
** Convert dates
Dates are represented in ISO 8601 format (YYYYWW) so let's parse
those. It should already be sorted chronologically, but let's make
sure of that too.
those. Entries are sorted chronologically, but in reverse, so we'll
fix that here too.
#+begin_src python :results silent :exports both
import datetime
......@@ -133,18 +133,18 @@ Let's check again:
#+begin_src python :results value :exports both
data_as_str = [(str(date), str(inc)) for date, inc in converted_data]
[('date', 'inc'), None] + data_as_str[:5] + [None] + data_as_str[-5:]
[('date', 'incidence'), None] + data_as_str[:5] + [None] + data_as_str[-5:]
#+end_src
#+RESULTS:
| date | inc |
|------------+-------|
| date | incidence |
|------------+-----------|
| 1990-12-03 | 1143 |
| 1990-12-10 | 11079 |
| 1990-12-17 | 19080 |
| 1990-12-24 | 19375 |
| 1990-12-31 | 15565 |
|------------+-------|
|------------+-----------|
| 2024-03-18 | 12818 |
| 2024-03-25 | 18322 |
| 2024-04-01 | 16181 |
......@@ -156,17 +156,19 @@ data_as_str = [(str(date), str(inc)) for date, inc in converted_data]
So, now we can take a look at incidence over time. (The 'flu notebook
switches to R here, but we're going to stick with python.)
#+begin_src python :results output file :var filename="./incidence.png" :exports both
#+begin_src python :results value file :var filename="./incidence.png" :exports both
import matplotlib.pyplot as plt
plt.clf()
date,incidence = zip(*converted_data)
plt.figure(figsize=(7.5,5))
plt.plot(date,incidence)
plt.tight_layout()
# plt.tight_layout()
plt.savefig(filename)
print(filename)
filename
#+end_src
#+RESULTS:
......@@ -174,17 +176,17 @@ print(filename)
And we can zoom in on a period of, say, five years:
#+begin_src python :results output file :var filename="./incidence-zoom.png" :exports both
#+begin_src python :results value file :var filename="./incidence-zoom.png" :exports both
plt.clf()
start = 10
start = 15
years = 5
date,incidence = zip(*converted_data[52*start:52*(start+years)])
plt.plot(date,incidence)
plt.tight_layout()
# plt.tight_layout()
plt.savefig(filename)
print(filename)
filename
#+end_src
#+RESULTS:
......@@ -192,3 +194,78 @@ print(filename)
It looks like incidence peaks in the spring, with lowest numbers
around September.
* Study of annual incidence
** Compute annual incidence
So, let's calculate the incidence for each year. We'll define this is
as the sum of the incidence reports from the beginning of September in
year /N-1/ to the end of August in year /N/.
#+begin_src python :results silent :exports both
def annual_incidence(year):
start = datetime.datetime.strptime(f"{year-1}-09-01", '%Y-%m-%d').date()
end = datetime.datetime.strptime(f"{year}-09-01", '%Y-%m-%d').date()
weeks = [d for d in converted_data if d[0] > start and d[0] <= end]
return sum([w[1] for w in weeks])
#+end_src
That was quick and dirty and not entirely a nice time.
Now we can define the years we're interested in. These are the years
for which we have a full year's worth of data:
#+begin_src python :results silent :exports both
years = range(1991, 2024)
#+end_src
NB. The second argument to ~range~ is non-inclusive.
Now we can perform a list comprehension to get the incidence for each
year:
#+begin_src python :results value :exports both
incidence_per_year = [(y,annual_incidence(y)) for y in years]
head, *tail = incidence_per_year
head
#+end_src
#+RESULTS:
| 1991 | 553895 |
** Visual Inspection
Now we can plot incidence against year.
#+begin_src python :results value file :var filename="./annual_incidence.png" :exports both
plt.clf()
years,incidences = zip(*incidence_per_year)
plt.bar(years,incidences)
plt.ylabel("Annual incidence")
plt.savefig(filename)
filename
#+end_src
#+RESULTS:
[[file:./annual_incidence.png]]
Eyeballing the plots, it looks like although 2003-04 had the greatest
spikes in terms of weekly cases, 2009-10 featured longer spells of
consistently high numbers of cases.
Anyway, let's get the strongest and weakest epidemics:
#+begin_src python :results value :exports both
key = lambda y: y[1]
strongest = max(incidence_per_year, key=key)
weakest = min(incidence_per_year, key=key)
(strongest,weakest)
#+end_src
#+RESULTS:
| 2009 | 841233 |
| 2020 | 221183 |
module3/exo2/incidence-zoom.png

54.1 KB | W: | H:

module3/exo2/incidence-zoom.png

53.2 KB | W: | H:

module3/exo2/incidence-zoom.png
module3/exo2/incidence-zoom.png
module3/exo2/incidence-zoom.png
module3/exo2/incidence-zoom.png
  • 2-up
  • Swipe
  • Onion skin
module3/exo2/incidence.png

62.3 KB | W: | H:

module3/exo2/incidence.png

63 KB | W: | H:

module3/exo2/incidence.png
module3/exo2/incidence.png
module3/exo2/incidence.png
module3/exo2/incidence.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment