#+TITLE: Analysis of custom data - Module 2 - Exercise 4
#+AUTHOR: Miguel Felipe Silva Vasconcelos
#+DATE: 28/02/2021
#+LANGUAGE: en
# #+PROPERTY: header-args :eval never-export
#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD:
* Introduction
For the purpose of only solving this exercise, I'm using data that
was randomly generated. The file /data.csv/ contains two columns:
- The first column represents the day of the month
- The second column represents how many minutes were spent doing
determined task (in this case, studying for this MOOC)
The analysis will present the following metrics: median, average,
standard deviation, maximum,
and minimum value, regarding the time spent on each day.
* Results of the experiments
I'm using the [[https://pandas.pydata.org/][Pandas library]] to facilitate reading the date from the
CSV file and to learn a new tool :).
#+begin_src python :results value :session *python* :exports both #using value, prints the variable without showing the console output
import pandas as pd # using pandas to facilitate working with date and time
dataframe = pd.read_csv("data.csv", parse_dates=[0], delimiter = ';', header=None)
dataframe
#+end_src
#+RESULTS:
#+begin_example
0 1
0 2021-02-01 81.819914
1 2021-02-02 45.630108
2 2021-02-03 70.870649
3 2021-02-04 5.975111
4 2021-02-05 101.240122
5 2021-02-06 103.766044
6 2021-02-07 52.724327
7 2021-02-08 68.712419
8 2021-02-09 24.769924
9 2021-02-10 118.519012
10 2021-02-11 72.366803
11 2021-02-12 114.271576
12 2021-02-13 22.577226
13 2021-02-14 9.454489
14 2021-02-15 82.041779
15 2021-02-16 113.367189
16 2021-02-17 69.055952
17 2021-02-18 23.393082
18 2021-02-19 59.451386
19 2021-02-20 11.830620
20 2021-02-21 38.629430
21 2021-02-22 55.876251
22 2021-02-23 69.602759
23 2021-02-24 12.494400
24 2021-02-25 115.595595
25 2021-02-26 56.179007
26 2021-02-27 64.323035
27 2021-02-28 4.862036
#+end_example
* Calculating the average/mean
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html][mean method]]
#+begin_src python :results output :session *python* :exports both #using output, prints only what is shown in the console
average = dataframe[1].mean()
print(average)
#+end_src
#+RESULTS:
: 59.621437271033706
* Calculating the standard deviation
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.std.html][std method]]
#+begin_src python :results value :session *python* :exports both
std = dataframe[1].std()
std
#+end_src
#+RESULTS:
: 36.12909565271962
* Calculating the median
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.median.html][median method]]
#+begin_src python :results value :session *python* :exports both
median = dataframe[1].median()
median
#+end_src
#+RESULTS:
: 61.88721048734205
* Finding the minimum value (time spent)
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.min.html][min method]]
#+begin_src python :results value :session *python* :exports both
min = dataframe[1].min()
min
#+end_src
#+RESULTS:
: 4.86203636954475
* Finding the day with the minimum time spent studying
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html][idxmin method]]
#+begin_src python :results output :session *python* :exports both
idmin = dataframe[1].idxmin()
idmin
print (dataframe[0][idmin] , dataframe[1][idmin] )
#+end_src
#+RESULTS:
: 2021-02-28 00:00:00 4.86203636954475
* Finding the maximum value (time spent)
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.max.html][max method]]
#+begin_src python :results value :session *python* :exports both
max = dataframe[1].max()
max
#+end_src
#+RESULTS:
: 118.519011934154
* Finding the day with the maximum time spent studying
We can use pandas' [[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmax.html][idxmax method]]
#+begin_src python :results output :session *python* :exports both
idmax = dataframe[1].idxmax()
idmax
print (dataframe[0][idmax],dataframe[1][idmax] )
#+end_src
#+RESULTS:
: 2021-02-10 00:00:00 118.519011934154
* Generating a graphic of the data:
#+begin_src python :results output file :session *python* :var matplot_lib_filename2="simple_plot.png" :exports both
from matplotlib import pyplot as plt
fig, ax = plt.subplots(figsize=(12, 12))
ax.bar(dataframe.index.values,
dataframe[1],
color='purple')
ax.set(xlabel="Date",
ylabel="Time Spent",
title="Daily Time spent studying for the MOOC on reproducible research - feb/2021")
plt.savefig(matplot_lib_filename2)
print(matplot_lib_filename2)
#+end_src
#+RESULTS:
[[file:simple_plot.png]]