# __Logbook of JAIRO ANDRÉS CAMAÑO ECHAVARRIA__ # __MODULE 1__ Important notes ## QUIZ 01 1. Why has a European project recently used the logbooks of the Portuguese, Spanish, Dutch and English Indian Companies (Cf. Christophe Pouzat video : Note-taking concerns everyone) ? To try to reconstitute the ocean climate criss-crossed by the Western navies 2. What note media are illustrated in the course video "Note-taking concerns everyone" by Christophe Pouzat? Notes in books and manuscripts margins Notes in field books Notes on cards and paper slips 3. Why did Leibniz order the construction of a closet ? To store and order notes written on paper slips 4. For the curious, visit the Darwin Online web sites go to the notebooks and describe how Darwin took his notes. First in notebooks then on cards and paper sheets stored in folders ## QUIZ 02 1. What is the origin of the codex? The Egyptian production of papyrus was not large enough to meet the demand of writers 2. What aspect of Eusebius work is presented in this sequence? His canon tables (cross-references between the Gospel books) 3. In which line should the keyword "Analysis" go in John Locke's index ? « Aa » c. « Aa » ## QUIZ 03 1. What is a text file ? A file made up (stored as) UTF-8 characters 2. What is a tag ? A character, or series of characters, used to structure a document that will be invisible to the final reader 3. Markdown is a markup language Light ## QUIZ 04 1. LibreOffice makes the comparison of two successive versions possible. True 2. A wiki engine allows us to modify a single page at a time True 3. GitHub and GitLab let us work with binary files like images. True ## QUIZ 05 1. What are the limitations of the search functionality of text editors ? They only work with text files They work on a single file at a time 2. What is DocFetcher ? A cross-plateform software A desktop search engine 3. What does it make sense to use tags and keywords ? To filter out overabundant information To find quickly relevant information # __EXERCISE 01__ # __Partie 1__ ## __Sous-partie 1: texte__ Une phrase sans rien *Une pharse en italique* __Une phrase en gras__ Un lien vers [fun-mooc](http://fun-mooc.fr) Une ligne de `code` ## __Sous-partie 2: Listes__ Liste à puce - Item - Sous-item - Sous-item - Item - Item Liste numérotée 1. item 2. item 3. item ## __Sous-partie 3: code__ ``` # Extrait de code ``` ## EXERCISES MODULE 1 ## EXERCISE 01-1 ## EXERCISE 01-2 # __MODULE 2__ ## QUIZ 06 1. A computational document allows you to Improve the traceability of a calculation Easily present your work to colleagues Access all the calculations underlying an analysis 2. Which environment(s) are presented to you in this MOOC? Rstudio Emacs/OrgMode Jupyter 3. Which environment is recommended if your preferred language is Python? Jupyter 4. Which environment is recommended if your preferred language is the R language? Rstudio 5. Which environment is used daily by the three authors of this MOOC? Emacs/OrgMode ## QUIZ 07 1. In the studies we have presented to you, what prevents, sometimes for several years, the debate on the relevance of a study? Unpublished computation procedures Data used in the study was not released 2. In the various examples presented (economics, MRI, crystallography), what are the main causes of errors ? Data acquisition (bias, machine calibration, etc.) Computation errors Inadequate data processing or statistics 3. What are the consequences of lack of transparency? (4 expected responses) It's difficult to rely on the work of others Articles contain less information (no details on calculations, experimental protocols, data analysis, etc.) and are therefore easier to read It is difficult to verify and reproduce the analyses presented in the articles Two articles may present results that seem to contradict each other, but are both perfectly correct, as the lack of detail prevents the exact conditions of application from being determined ## QUIZ 08 1. What are the main technical causes behind the difficulties in reproducing someone else's work? (4 expected responses) Lack of documentation on the choices made: Interactive graphical software that hide computation details Computation errors Data loss (no backup or no more readable format) 2. Which solutions are mentioned? (3 expected responses) Using a laboratory notebook Code review and continuous integration Using version control systems and several backup mechanisms 3.What are the most legitimate/valid fears associated with the systematic disclosure of data (open data) (2 expected responses)? This list of risks is of course not deliberately exhaustive...: ## QUIZ 10 1. What is commonly found in a computational document? Commentaries Code An overview of data Computational results Hypertext links Images 2. What does a computational document allow? Inspect the computations Easily re-run the computations if the original environment is available Document the code Explain why a particular computation is made based on the data analysis so far Use multiple languages to perform computations (even if it may require some work) ## QUIZ P01 1. What does an environment like Jupyter provide in comparison to working in the Python console or running R scripts directly? It provides a well-structured history of the analyses performed. It allows you to inspect data, keep a history of this inspection, and explain the transformations you perform as you go along It saves intermediate results, whether textual or graphical It allows you to generate documents in HTML or PDF It allows you to ensure that a figure is the result of the computation described in the document. 2. In Jupyter, what features are provided for the Python language but not available for the R language? N.B.: You may want to try it out by yourself by opening a Jupyter notebook via the big button under the previous video. You can switch from Python to R through the Jupyter menu (Kernel->Change Kernel->R). There are the same features for both languages 3. What allows you to be effective in an environment like Jupyter? The export functions and the ability to easily re-run the code from the beginning Autocompletion Learning keyboard shortcuts Reading the documentation and cheat sheets ## EXERCISES MODULE 2 ## EXERCISE 2-1 1 On the computation of π 1.1 Asking the maths library My computer tells me that π is approximatively In [1]: from math import * print(pi) 3.141592653589793 1.2 Buffon’s needle Applying the method of Buffon’s needle, we get the approximation In [2]: import numpy as np np.random.seed(seed=42) N = 10000 x = np.random.uniform(size=N, low=0, high=1) theta = np.random.uniform(size=N, low=0, high=pi/2) 2/(sum((x+np.sin(theta))>1)/N) Out[2]: 3.1289111389236548 1.3 Using a surface fraction argument A method that is easier to understand and does not make use of the sin function is based on the fact that if X ∼ U(0, 1) and Y ∼ U(0, 1), then P[X 2 + Y 2 ≤ 1] = π/4 (see "Monte Carlo method" on Wikipedia). The following code uses this approach: In [3]: %matplotlib inline import matplotlib.pyplot as plt np.random.seed(seed=42) N = 1000 x = np.random.uniform(size=N, low=0, high=1) y = np.random.uniform(size=N, low=0, high=1) accept = (x*x+y*y) <= 1 reject = np.logical_not(accept) fig, ax = plt.subplots(1) ax.scatter(x[accept], y[accept], c='b', alpha=0.2, edgecolor=None) ax.scatter(x[reject], y[reject], c='r', alpha=0.2, edgecolor=None) ax.set_aspect('equal') It is then straightforward to obtain a (not really good) approximation to π by counting how many times, on average, X 2 + Y 2 is smaller than 1: In [4]: 4*np.mean(accept) Out[4]: 3.1120000000000001 ## EXERCISE 2-2 ## QUIZ 12 1. What distinguishes a replicable data analysis from a traditional analysis? The code for all computations is included 2. What are the advantages of a replicable analysis? What are the advantages of a replicable analysis? It is easier to modify It is easier to verify ## QUIZ 13 1. Where do the data on the incidence of influenza-like illness come from?. From the “réseau Sentinelles”, a network of general practitioners 2. In which format are the data avialable? CSV format 3. Which is the sampling frequency of the incidence data? One value per week 4. Why do we advise against removing the missing data line from the downloaded data file? It would leave no visible trace of the manipulation ## QUIZ P04 1. Where did we find the URL for downloading the data? In the Web browser’s download history 2. How do we handle missing data? We remove the data points before continuing with the analysis ## QUIZ P07 1. Why do we have to transform the week labels? Pandas cannot interpret the format of the original data 2. What's the point of checking that the distance between two consecutive weeks is seven days? The check would find weeks completely absent from the dataset The check could have identified mistakes in the date conversion 3. Which methods did we use to verify our work? Visual inspection Code written specifically for verification ## QUIZ P10 1. Why did we choose the first of August as the beginning of each annual period? The incidence of influenza-like illness is weakest around that date 2. Why don’t our annual periods contain exactly 52 weeks? A year has always more than 7 x 52 days