# MOOC Reproducible Research: my logbook # Mission 2 ## Module 1: Cahier de notes, cahier de laboratoire I have started this module on June 5. So far, I have learned: - What GitLab is and how to use it; - How to create and edit Markdown files; - How to write in Markdown using headers, lists, links, and code blocks. I also watched the training videos provided in the Module 1, and completed several quizzes relsted to th each thematique of this Module. ### Excercise 1 is done: - 01-1: Found the commit for Helloworld Python - Commit number: `505c4e26` - Author of the commit: `Arnaud Legrand` - 01-2: Created a Markdown file with all required element and compared with solution ### Excercise 2 is done: - 2.1: Commit Jupyter notebook (toy_notebook.ipynb) - I have perfomed the various commits and saved the files "toy_notebook" in \Module2\exo1 - I have also performed commit and saved the file "toy_notebook" in \Module1\exo2 - 2.2: commit committer - Comitter: Diana_kerimbekova (f8dc60cab5180566667b00ce62a51ae7) - `623c226a`: Adding the toy_notebook to \Module 1\exo2 - `8767d70c`: Commit message - ........... - ![Git commit graph](commiter1.jpg) ![Git commit graph](commiter2.jpg) - 2.3: commit graph ![Git commit graph](commit_graph.jpg) ### What I learned during Mission 2: - How to write and edit Markdown file - How to use GitLab history - beginnig of the work with Jupyter notebook and commited it # Mission 3 - Module 2. ## Excercice 02 (1st part) This task is related to Module 2 "la vitrine et l'envers du décor : le document computationnel". As a part of this exercise, I reproduced the given PDF document using a Jupyter notebook. I commited the completed notebiik to Gitlab, under the following part: **module2/exo1/toy_notebook_en.ipynb**. While reproducing the PDF document in Jupyter notebook, I successfully completed the following tasks: - Added the main title **"À propos de pi"** - Included the relevant mathematical formulas - Wrote and executed the code that prints the value of $\pi$ - Added a hyperlink to the Wikipedia article on **"aiguilles de Buffon"** - Implemented **Buffon's method** in code, displaying - Wrote the code that display the final diagram Afterwards, I compared my version with the reference solution provided on the Mooc platform. This comparison helped me identify the differences and better understand the expected structure. *All actions have been verified and marked as completed in the Jupyter notebook and on the Mooc platform.* ## Excercise 02 (2nd part) In this exercise, I performed a simple statistical analysis. Using the provided data, I computed the **mean**, **standard variation**, **minimum**, **median**, and **maximum** values of the dataset. ## Excercise 02 (3rd part) Based on the dataset provided in the previous excercise (2.2), I reproduced a **sequence plot** and **a histogram** to visualize the data. ## Exercise 02 (5th part): Critical examination of a data analysis (Challenger O-ring case) In this task, I worked with a historical dataset related to the **risk of failure of the O-ring seals** in the space shuttle *Challenger*. This dataset gained significance after the tragic explosion of the shuttle 73 seconds after launch, resulting in the loss of seven astronauts. Prior to the launch, a teleconference was held to assess the risk of taking off on an exceptionally cold morning. However, despite discussions, the flight was not delayed, and the analysis presented at the time significantly underestimated the probability of failure. My objective was to critically review the analysis and identify the major reasoning errors that led to the flawed conclusion. **Key steps taken:** - I reviewed the historical context and the principles of logistic regression, using the resource provided in GitLab. - I analyzed the dataset and evaluated the model that was used at the time. - I identified three majore errors that caused the underestimation of failure probability: 1. **No data was available for low temperatures**, yet conclusions were still drawn. 2. **Key data points were excluded** from the analysis. 3. **the uncertainity of the logistic regression estimates was not taken into accounts**, leading to overconfidence in the predictions. # Mission 4 - Module 3 ## Exercise 03 (1st part) In this task, the following steps were perfomed: - The small part of code was implemented and modified to check whether the local CSV file already exists. - If the file does not exist, it is downloaded from the Réseau Sentinelles server and saved localy. - If the file is present, the local copy was used instead of re-downloading the data. - Necessary comments were added in the code to explain the logic and purpose of each step: - Firstly, check if the local CSV file already exists. - Secondly, if the file does not exist, download it from the official URL and save it locally. - Thirdly,this ensures that the analysis uses a consistent local copy and avoids repeated downloads. **A comparative analysis with the reference solution confirmed the implementation was correct.** ## Exercise 03 (2nd part) The objective of this tast is to analyze the incidence of chickenpox using a new dataset. It is necessary to identify the strongest and weakest epidemic years using the incidence data. For this aim, I perfomed the following step: - Loaded the dataset from the Website Sentinelles - Converted weekly data to dates and grouped it by epidemiological year (from September 1 to August 31) - Calculated the total incidence per epidemiological year ## Exercise for evaluation in pair Besides as part of *Mission 4*, I created a computational document by choosing **Subject 1: CO₂ concentration in the atmosphere since 1958**. In this project, I analyzed atmospheric CO₂ concentration data from the Mauna Loa Observatory, known as the **Keeling Curve**, covering the period from 1958 to the present. The analysis included: - Importing the dataset from the Scripps CO₂ Program - Visualizing the raw CO₂ time series to show seasonal oscillations and long-term trends - Performing seasonal decomposition to isolate and analyze monthly variations - Building a linear regression model to estimate the long-term trend - Forecasting CO₂ levels up to the year 2025 - Computing yearly statistics (minimum, maximum, mean CO₂ per year) All steps were carried out using Python 3 in a Jupyter notebook.