# __Logbook of JAIRO ANDRÉS CAMAÑO ECHAVARRIA__

# __MODULE 1__

Important notes


## QUIZ 01

1. Why has a European project recently used the logbooks of the Portuguese, Spanish, Dutch and English Indian Companies (Cf. Christophe Pouzat video : Note-taking concerns everyone) ?

To try to reconstitute the ocean climate criss-crossed by the Western navies

2. What note media are illustrated in the course video "Note-taking concerns everyone" by Christophe Pouzat?

Notes in books and manuscripts margins
Notes in field books
Notes on cards and paper slips

3. Why did Leibniz order the construction of a closet ?

To store and order notes written on paper slips 

4. For the curious, visit the Darwin Online web sites go to the notebooks and describe how Darwin took his notes.

First in notebooks then on cards and paper sheets stored in folders


## QUIZ 02

1. What is the origin of the codex?

The Egyptian production of papyrus was not large enough to meet the demand of writers

2. What aspect of Eusebius work is presented in this sequence?

His canon tables (cross-references between the Gospel books)

3.  In which line should the keyword "Analysis" go in John Locke's index ?

« Aa » c. « Aa » 


## QUIZ 03

1. What is a text file ?

A file made up (stored as) UTF-8 characters

2. What is a tag ?

A character, or series of characters, used to structure a document that will be invisible to the final reader

3. Markdown is a markup language

Light


## QUIZ 04

1. LibreOffice makes the comparison of two successive versions possible.

True

2. A wiki engine allows us to modify a single page at a time

True

3. GitHub and GitLab let us work with binary files like images.

True

## QUIZ 05

1. What are the limitations of the search functionality of text editors ?

They only work with text files

They work on a single file at a time

2. What is DocFetcher ?

A cross-plateform software

A desktop search engine

3. What does it make sense to use tags and keywords ?

To filter out overabundant information

To find quickly relevant information


# __EXERCISE 01__

# __Partie 1__

## __Sous-partie 1: texte__

Une phrase sans rien

*Une pharse en italique*

__Une phrase en gras__

Un lien vers
[fun-mooc](http://fun-mooc.fr)

Une ligne de `code`

## __Sous-partie 2: Listes__

Liste à puce

- Item 
  - Sous-item
  - Sous-item
- Item
- Item


Liste numérotée
1. item
2. item
3. item

## __Sous-partie 3: code__
```
# Extrait de code
```


## EXERCISES MODULE 1 

## EXERCISE 01-1


## EXERCISE 01-2


# __MODULE 2__

## QUIZ 06

1. A computational document allows you to

Improve the traceability of a calculation
Easily present your work to colleagues
Access all the calculations underlying an analysis

2. Which environment(s) are presented to you in this MOOC?

Rstudio
Emacs/OrgMode
Jupyter

3. Which environment is recommended if your preferred language is Python?

Jupyter

4. Which environment is recommended if your preferred language is the R language?

Rstudio 

5. Which environment is used daily by the three authors of this MOOC?

Emacs/OrgMode 

## QUIZ 07

1. In the studies we have presented to you, what prevents, sometimes for several years, the debate on the relevance of a study?

Unpublished computation procedures
Data used in the study was not released

2. In the various examples presented (economics, MRI, crystallography), what are the main causes of errors ?

Data acquisition (bias, machine calibration, etc.) 
Computation errors 
Inadequate data processing or statistics 

3. What are the consequences of lack of transparency? (4 expected responses)

It's difficult to rely on the work of others
 Articles contain less information (no details on calculations, experimental protocols, data analysis, etc.) and are therefore easier to read 
 It is difficult to verify and reproduce the analyses presented in the articles
 Two articles may present results that seem to contradict each other, but are both perfectly correct, as the lack of detail prevents the exact conditions of application from being determined 
 
 ## QUIZ 08
 
 1. What are the main technical causes behind the difficulties in reproducing someone else's work? (4 expected responses)
 
Lack of documentation on the choices made:
Interactive graphical software that hide computation details 
Computation errors 
Data loss (no backup or no more readable format)

2. Which solutions are mentioned? (3 expected responses)

Using a laboratory notebook
Code review and continuous integration
Using version control systems and several backup mechanisms


3.What are the most legitimate/valid fears associated with the systematic disclosure of data (open data) (2 expected responses)?

This list of risks is of course not deliberately exhaustive...:

## QUIZ 10

1. What is commonly found in a computational document?

Commentaries
Code
An overview of data
Computational results
Hypertext links
Images

2. What does a computational document allow?

Inspect the computations
Easily re-run the computations if the original environment is available
Document the code 
Explain why a particular computation is made based on the data analysis so far
Use multiple languages to perform computations (even if it may require some work)


## QUIZ P01

1. What does an environment like Jupyter provide in comparison to working in the Python console or running R scripts directly?

It provides a well-structured history of the analyses performed.
It allows you to inspect data, keep a history of this inspection, and explain the transformations you perform as you go along
It saves intermediate results, whether textual or graphical
It allows you to generate documents in HTML or PDF 
It allows you to ensure that a figure is the result of the computation described in the document.


2. In Jupyter, what features are provided for the Python language but not available for the R language?
N.B.: You may want to try it out by yourself by opening a Jupyter notebook via the big button under the previous video. You can switch from Python to R through the Jupyter menu (Kernel->Change Kernel->R).

There are the same features for both languages
 
 3. What allows you to be effective in an environment like Jupyter?
 
The export functions and the ability to easily re-run the code from the beginning
Autocompletion
Learning keyboard shortcuts
Reading the documentation and cheat sheets


## EXERCISES MODULE 2

## EXERCISE 2-1

1 On the computation of π
1.1 Asking the maths library
My computer tells me that π is approximatively
In [1]: from math import *
print(pi)
3.141592653589793
1.2 Buffon’s needle
Applying the method of Buffon’s needle, we get the approximation
In [2]: import numpy as np
np.random.seed(seed=42)
N = 10000
x = np.random.uniform(size=N, low=0, high=1)
theta = np.random.uniform(size=N, low=0, high=pi/2)
2/(sum((x+np.sin(theta))>1)/N)
Out[2]: 3.1289111389236548
1.3 Using a surface fraction argument
A method that is easier to understand and does not make use of the sin function is based on the
fact that if X ∼ U(0, 1) and Y ∼ U(0, 1), then P[X
2 + Y
2 ≤ 1] = π/4 (see "Monte Carlo method"
on Wikipedia). The following code uses this approach:
In [3]: %matplotlib inline
import matplotlib.pyplot as plt
np.random.seed(seed=42)
N = 1000
x = np.random.uniform(size=N, low=0, high=1)
y = np.random.uniform(size=N, low=0, high=1)

accept = (x*x+y*y) <= 1
reject = np.logical_not(accept)
fig, ax = plt.subplots(1)
ax.scatter(x[accept], y[accept], c='b', alpha=0.2, edgecolor=None)
ax.scatter(x[reject], y[reject], c='r', alpha=0.2, edgecolor=None)
ax.set_aspect('equal')

It is then straightforward to obtain a (not really good) approximation to π by counting how
many times, on average, X
2 + Y
2
is smaller than 1:
In [4]: 4*np.mean(accept)
Out[4]: 3.1120000000000001


## EXERCISE 2-2


## QUIZ 12

1. What distinguishes a replicable data analysis from a traditional analysis?

 The code for all computations is included 
 
2. What are the advantages of a replicable analysis? What are the advantages of a replicable analysis?

It is easier to modify
It is easier to verify

## QUIZ 13

1. Where do the data on the incidence of influenza-like illness come from?.

From the “réseau Sentinelles”, a network of general practitioners

2. In which format are the data avialable?

CSV format 

3. Which is the sampling frequency of the incidence data?

One value per week

4. Why do we advise against removing the missing data line from the downloaded data file?

It would leave no visible trace of the manipulation


## QUIZ P04

1. Where did we find the URL for downloading the data?

In the Web browser’s download history 

2. How do we handle missing data?

We remove the data points before continuing with the analysis 

## QUIZ P07

1. Why do we have to transform the week labels?

Pandas cannot interpret the format of the original data

2. What's the point of checking that the distance between two consecutive weeks is seven days?

The check would find weeks completely absent from the dataset
The check could have identified mistakes in the date conversion

3. Which methods did we use to verify our work?

Visual inspection
Code written specifically for verification


## QUIZ P10

1. Why did we choose the first of August as the beginning of each annual period?

The incidence of influenza-like illness is weakest around that date

2. Why don’t our annual periods contain exactly 52 weeks?

A year has always more than 7 x 52 days