# Logbook of the reproducible research MOOC This logbook will contain the main points I learnt and found useful in this MOOC, as well as some usefull references and resources. ## Module 1: note books and lab books ### 20.03.20 - 24.03.20 This module was mostly about taking notes. What I keep from it: * Markdown is cool to take random notes, and keep information. This could replace the current Latex files I am using to keep some info on bibliography. * The use of tags in my files could greatly improve my ability to find back some informations ! They suggest to use DocFetcher, which makes reaserches into text files. There is a tutorial in the MOOC I should check later. Here is the [web page of DocFectcher](http://docfetcher.sourceforge.net/en/index.html). * Git is awsome! I am starting to be more efficient with git. I will keep on feeding my GitHub for R packages development, but I am thinking of creating a private project on GitLab to store and share data analysis scripts and files with colleagues. * [ ] Cool! This is a task list! I am not sure about what we should record as: > des données quotidiennes qui vous intéressent (temps, etc.). Vous les utiliserez par la suite dans le module 2. But just in case, today is sunny, and I would like to eat a lemon icecream. Ah yeah, btw... COVID-19 Things I would like to figure out: * On GitLab: * Can the changes made in this file be saved without being commited? *Doesn't look so* * what is the maximum storage? * can I give access to a private project to someone with a link, or does the person need to own a GitLab account? * what is this "No wrap"/"Soft wrap" button? *It just does automatic cut of the lines si everything can fit on the screen* **There is an obvious lack of structure in this section!** ## Module 2: computational documents ### 25.03.20 What is needed for a study to be reproducible? * The data should and the way they were collected should be available * Each choice made in the methodology should be explained justified : all this should be ketp in the lab book Le clique boutton, les tableurs et les logiciels propriétaires c'est pas top! **Be organized, and store your data, randomize your experimental design, use text format, use free software!** It sunny and I just realised that I can ear the church bell from my garden. Ah yeah, btw... COVID-19 #### Rstudio tutorial I choose to follow the Rstudio tutorial even if I already know how to produce Rmarkdown files, but it would be cool to try later OrgMode, so that I could use Emacs to do eveything (while fore now I am only using it to produce Latex documents). Something I didn't know: when you use another langage than R (for example when I use some pieces of bash/python for OBITOOLS pipelines), the variables generated in a chunck are not kept in memory and cannot be used from one chunk to another (which is the case for R code). ### 26.03.20 Wow, I really want to try org-mode now... Learnt some really cool stuff about having a journal, and lab books. So far, my lab book have only been a paper book, and about my experiments, not my analysis. I attempted at some point to have a data analysis "lab book" but a paper version is obviously not the best choice for code. I gave up after looking for a informatic tool without succes. I should later read [this article](http://starpu-simgrid.gforge.inria.fr/misc/SIGOPS_paper.pdf) untitled **An Effective Git And Org-Mode Based WorkflowFor Reproducible Research** I finished the second module ! Yay! ## Module 3: une analyse réplicable ### 27.0.3.20 I did a replicable analysis of incidence of varicella. ## Module 4: La réalité du terrain #### 27.03.20 ##### L'enfer des données Les données sont parfois de types différent (ex texte et image), et il faut toutes les garder ensemble. De plus les données peuvent être trop grosses (par exemple le texte). Le format binaire prend moins de place et limite la conversion des nombre du format texte au format binaire pour faire les calculs. Or l'avantage des formats texte sont les metadonnées. Pb du format binaire, est qu'il peut changer d'un systeme d'exploitation à l'autre (différents boutismes) Deux formats binaires peuvent remplir ces critères: * FITS (Flexible Image Transport System) * HDF5 (Hierachical Data Format version 5) This is interesting but I don't think I am ready to use this yet (neither are my cotauthors...). #### 02.04.20 ##### L'enfer du logiciel L'utilisation d'un notebook peut devenir vite illisible pour une grosse étude, pas de vue d'ensemble. Une solution: utilisation d'un workflow pour exécuter differents morceaux de code, regroupés en fonctions. Mais pas d'exlication narrative contrairement au notebook. Exemple de workflow: Galaxie. Pas approprié pour moi pour le moment je pense...