# Journal de bord du Mooc / Mooc's logbook ## Module 1: **Exercise 01-1:** *Which two files contain the character string "LE MOOC RECHERCHE REPRODUCTIBLE C'EST GENIAL" ?* module1/exo1/aebef6b0a5.txt module1/exo1/f683bbad4b.txt **Quiz 01** *Why has a European project recently used the logbooks of the Portuguese, Spanish, Dutch and English Indian Companies* To try to reconstitute the ocean climate criss-crossed by the Western navies *What note media are illustrated in the course video "Note-taking concerns everyone" by Christophe Pouzat?* - Notes in books and manuscripts margins - Notes in field books - Notes on cards and paper slips *Why did Leibniz order the construction of a closet ?* To store and order notes written on paper slips *For the curious, visit the Darwin Online web sites go to the notebooks and describe how Darwin took his notes.* First in notebooks then on cards and paper sheets stored in folders **Quiz 02** *What is the origin of the codex?* The Egyptian production of papyrus was not large enough to meet the demand of writers *What aspect of Eusebius work is presented in this sequence?* His canon tables (cross-references between the Gospel books) *In which line should the keyword "Analysis" go in John Locke's index ?* « Aa » **Quiz 03** **Quiz 04** **Quiz 05** ## Module 2 ### toy project # Asking the maths library My computer tells me that π is approximatively ```python from math import * print(pi) ``` 3.141592653589793 # Buffon’s needle Applying the method of Buffon’s needle, we get the approximation ```python import numpy as np np.random.seed(seed=42) N = 10000 x = np.random.uniform(size=N, low=0, high=1) theta = np.random.uniform(size=N, low=0, high=pi/2) 2/(sum((x+np.sin(theta))>1)/N) ``` 3.128911138923655 # Using a surface fraction argument A method that is easier to understand and does not make use of the sin function is based on the fact that if X ∼ U(0, 1) and Y ∼ U(0, 1), then P[X 2 + Y 2 ≤ 1] = π/4 (see "Monte Carlo method" on Wikipedia). The following code uses this approach: ```python %matplotlib inline import matplotlib.pyplot as plt np.random.seed(seed=42) N = 1000 x = np.random.uniform(size=N, low=0, high=1) y = np.random.uniform(size=N, low=0, high=1) accept = (x*x+y*y) <= 1 reject = np.logical_not(accept) fig, ax = plt.subplots(1) ax.scatter(x[accept], y[accept], c='b', alpha=0.2, edgecolor=None) ax.scatter(x[reject], y[reject], c='r', alpha=0.2, edgecolor=None) ax.set_aspect('equal') ``` It is then straightforward to obtain a (not really good) approximation to π by counting how many times, on average, X 2 + Y 2 is smaller than 1: ```python 4*np.mean(accept) ``` 3.112 ```python ``` **Exercice 02-2** *What is the average ?* 14.11 *What is the minimum ?* 2.8 *What is the maximum ?* 23.4 *What is the median ?* 14.5 *What is the standard deviation ?* 4.33 **Quiz 06** *A computational document allows you to:* - Improve the traceability of a calculation - Easily present your work to colleagues - Access all the calculations underlying an analysis *Which environment(s) are presented to you in this MOOC?* - Rstudio - Emacs/OrgMode - Jupyter *Which environment is recommended if your preferred language is Python?* Jupyter *Which environment is recommended if your preferred language is the R language?* Rstudio *RstudioWhich environment is used daily by the three authors of this MOOC?* Emacs/OrgMode b. Emacs/OrgMode - correct **Quiz 7** *In the studies we have presented to you, what prevents, sometimes for several years, the debate on the relevance of a study?* - Unpublished computation procedures - Data used in the study was not released *In the various examples presented (economics, MRI, crystallography), what are the main causes of errors ?* - Data acquisition (bias, machine calibration, etc.) - Computation errors - Inadequate data processing or statistics *What are the consequences of lack of transparency?* - It's difficult to rely on the work of others - Articles contain less information (no details on calculations, experimental protocols, data analysis, etc.) and are therefore easier to read - It is difficult to verify and reproduce the analyses presented in the articles - Two articles may present results that seem to contradict each other, but are both perfectly correct, as the lack of detail prevents the exact conditions of application from being determined **Quiz 8** *What are the main technical causes behind the difficulties in reproducing someone else's work?* - Lack of documentation on the choices made - Interactive graphical software that hide computation details - Computation errors - Data loss (no backup or no more readable format) *Which solutions are mentioned?* - Using a laboratory notebook - Code review and continuous integration - Using version control systems and several backup mechanisms *What are the most legitimate/valid fears associated with the systematic disclosure of data (open data)* - Some information may be sensitive and its disclosure may hurt people - My resources are limited. If I systematically host all this data on the web page provided by my employer, I am likely to quickly exceed my quota