Commit 89da4fd1 authored by Arnaud Legrand's avatar Arnaud Legrand

Automatically generated files (with read-only mode)

parent aae67ce9
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Table of Contents<span class="tag" data-tag-name="TOC"></span>
==============================================================
- [Installing RStudio](#installing-rstudio)
- [Linux (debian, ubuntu)](#linux-debian-ubuntu)
- [Mac OSX and Windows](#mac-osx-and-windows)
- [RStudio documentation](#rstudio-documentation)
- [Using Git from RStudio](#using-git-from-rstudio)
- [Cloning a repository](#cloning-a-repository)
- [Modifying a file](#modifying-a-file)
Installing RStudio
==================
Linux (debian, ubuntu)
----------------------
We provide here only instructions for Debian-based distributions. Feel free to contribute to this document to provide up-to-date information for other distributions (e.g., RedHat, Fedora).
Today, the stable versions of the most common distributions provide recent enough versions of R:
- Debian (stretch) ships with [R 3.3.3-1](https://packages.debian.org/stretch/r-base), [knitr 1.15.1](https://packages.debian.org/stretch/r-cran-knitr), and [ggplot 2.2.1](https://packages.debian.org/stretch/r-cran-ggplot2)
- Ubuntu (bionic 18.04) ships with [R 3.4.4](https://packages.ubuntu.com/bionic/r-base), and [knitr 1.17](https://packages.ubuntu.com/bionic/r-cran-knitr), and [ggplot 2.2.1](https://packages.ubuntu.com/bionic/r-cran-ggplot2)
- Ubuntu (artful 17.04) ships with [R 3.4.2](https://packages.ubuntu.com/artful/r-base), and [knitr 1.15](https://packages.ubuntu.com/artful/r-cran-knitr), and [ggplot 2.2.1](https://packages.ubuntu.com/artful/r-cran-ggplot2)
If your distribution is older than this, well, it may be a good time for upgrading...
### Installing R
First, you need to install the R language and convenient packages by running (as root):
``` shell
apt-get update ; sudo apt-get install r-base r-cran-knitr r-cran-ggplot2
```
Alternatively, if the installation of `r-cran-gplot2` or `r-cran-knitr` fails, you may want to install them locally (through the R packaging system) and manually by running the following commands in R (or RStudio):
``` r
install.packages("knitr")
install.packages("ggplot2")
```
If you plan to export pdf documents with LaTeX, you probably also want to run (as root):
``` bash
apt-get update ; apt-get install texlive-base
```
### Installing RStudio
RStudio is unfortunately not packaged within Debian so the easiest is to download the corresponding Debian package on the [RStudio webpage](https://www.rstudio.com/products/rstudio/download/#download) and then to install it manually (you may have to adjust the version number in the following example). Here is how to install it:
``` shell
cd /tmp/
wget https://download1.rstudio.org/rstudio-xenial-1.1.453-amd64.deb
sudo dpkg -i rstudio-xenial-1.1.453-amd64.deb
sudo apt-get update ; sudo apt-get -f install # to fix possibly missing dependencies
```
Mac OSX and Windows
-------------------
> Some instructions on installing R and knitr must be missing. This should be tested and improved.
- Download and install R from the [CRAN webpage](https://cran.r-project.org/) by choosing the right operating system.
- Download and install RStudio from the [RStudio webpage](https://www.rstudio.com/products/rstudio/download/#download) by choosing the right operating system.
- Download and install MiKTeX from the [MiKTeX webpage](https://miktex.org/download) by choosing the right operating system. You will be prompted to install some specific packages when exporting to pdf.
- Open RStudio and type the following commands in the console to install `knitr` and `ggplot2`:
``` r
install.packages("knitr", dep=TRUE)
install.packages("ggplot2", dep=TRUE)
```
RStudio documentation
=====================
The RStudio team has created a lot of very good material and tutorials. You should definitively look at the [Cheat sheets webpage](https://www.rstudio.com/resources/cheatsheets/). In particular you may want to have look at the following ones:
- [The RStudio IDE](https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf),
- [R Markdown](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) (here is also a [nice step-by-step presentation of Rmarkdown](https://rmarkdown.rstudio.com/)),
- The [R Markdown Reference guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf),
- [Data visualization with ggplot2](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf),
- [Data transformation with dplyr](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf)
In case it helps, here are some (sometimes outdated) French versions of these documents:
- [L'IDE RStudio](https://github.com/rstudio/cheatsheets/raw/master/translations/french/rstudio-IDE-cheatsheet.pdf)
- [Visualisation de données avec ggplot2](https://github.com/rstudio/cheatsheets/raw/master/translations/french/ggplot2-french-cheatsheet.pdf)
- [Transformation de données avec dplyr](https://github.com/rstudio/cheatsheets/raw/master/translations/french/data-wrangling-french.pdf)
- [Un court document sur R Markdown](https://www.fun-mooc.fr/c4x/UPSUD/42001S02/asset/RMarkdown.pdf)
Using Git from RStudio
======================
If you have never used git with RStudio, **we strongly advise that you follow [our tutorial on using git from RStudio](https://www.fun-mooc.fr/courses/course-v1:inria+41016+session02/jump_to_id/d132a854b0464ad29085cedaded23136)** (/"RStudio et Gitlab"/ in French). Before proceeding, make sure you also have followed the **["git/GitLab configuration" tutorial](https://www.fun-mooc.fr/courses/course-v1:inria+41016+session02/jump_to_id/7508aece244548349424dfd61ee3ba85)** (in French).
Alternatively, you may want to watch [this video](https://www.youtube.com/embed/uHYcDQDbMY8) (in English). If you do not like videos, you should have a look at the [step-by-step explanations from Software Carpentry](https://swcarpentry.github.io/git-novice/14-supplemental-rstudio/index.html). It comes with many screenshots and is quite progressive.
Cloning a repository
--------------------
Open RStudio and do the following steps:
- Create a new version controled project: `File / New Project / Version Control`
![](rstudio_images/new_project.png)
![](rstudio_images/git.png)
- Get the URL from your GitLab repository:
![](rstudio_images/adresse_depot.png)
- Indicate this URL in the "Repository URL" field (*you may want to prefix this URL with `xxx@` where `xxx` is* *your Gitlab id to avoid repeatedly giving it later on*).
![](rstudio_images/clone.png)
- If you're behind a proxy, git should be configured accordingly. Check the ["Dealing with proxies" section](https://www.fun-mooc.fr/courses/course-v1:inria+41016+session02/jump_to_id/7508aece244548349424dfd61ee3ba85).
- Git will then connect to Gitlab and fetch a whole copy of the repository.
- RStudio should restart in a mode related to Git:
![](rstudio_images/rstudio.png)
- The file manager on the right, allows you to browse the version controled repository.
Modifying a file
----------------
- Open `Module2/exo1/toy_document.Rmd` and perform a simple modification.
- Save
- Go to the Git menu to commit
![](rstudio_images/commit.png)
![](rstudio_images/commit2.png)
- Select the lines to commit and then click on `commit`
![](rstudio_images/commit5.png)
Your modifications have now been commited on your local machine. They haven't been propagated to GitLab yet.
- Click on `push` to propagate them on GitLab
![](rstudio_images/push.png)
![](rstudio_images/push2.png)
![](rstudio_images/push3.png)
**NB**: You won't be able to propagate your modifications on GitLab if some modifications have been done on GitLab in the meantime. ![](rstudio_images/push4.png)
- You should first merge these remote modifications locally. Click on `pull` to get these modifications on your machine.
Table des matières<span class="tag" data-tag-name="TOC"></span>
===============================================================
- [Installer RStudio](#installer-rstudio)
- [Linux (debian, ubuntu)](#linux-debian-ubuntu)
- [Mac OSX and Windows](#mac-osx-and-windows)
- [Documentation RStudio](#documentation-rstudio)
- [Utiliser Git avec RStudio](#utiliser-git-avec-rstudio)
- [Cloner un dépôt](#cloner-un-dépôt)
- [Modifier un fichier](#modifier-un-fichier)
Installer RStudio
=================
Linux (debian, ubuntu)
----------------------
Nous ne fournissons ici que des instructions pour les distributions basées sur Debian. N’hésitez pas à contribuer à ce document en fournissant des informations à jour sur les autres distributions (RedHat, Fedora, par exemple).
Aujourd'hui, les versions stables des distributions les plus courantes fournissent des versions assez récentes de R :
- Debian (stretch) est livré avec [R 3.3.3-1](https://packages.debian.org/stretch/r-base), [knitr 1.15.1](https://packages.debian.org/stretch/r-cran-knitr), et [ggplot 2.2.1](https://packages.debian.org/stretch/r-cran-ggplot2)
- Ubuntu (bionic 18.04) est livré avec [R 3.4.4](https://packages.ubuntu.com/bionic/r-base), [knitr 1.17](https://packages.ubuntu.com/bionic/r-cran-knitr), et [ggplot 2.2.1](https://packages.ubuntu.com/bionic/r-cran-ggplot2)
- Ubuntu (artful 17.04) est livré avec [R 3.4.2](https://packages.ubuntu.com/artful/r-base), [knitr 1.15](https://packages.ubuntu.com/artful/r-cran-knitr), et [ggplot 2.2.1](https://packages.ubuntu.com/artful/r-cran-ggplot2)
Si votre distribution est plus ancienne, c'est peut-être l'occasion de la mettre à jour...
### Installer R
Pour commencer, vous devez installer le langage R et quelques packages en exécutant (à la racine) :
``` shell
apt-get update ; sudo apt-get install r-base r-cran-knitr r-cran-ggplot2
```
Si l'installation de `r-cran-knitr` ou `r-cran-gplot2` échoue, vous pouvez également installer ces packages manuellement en exécutant les commandes suivantes sous R (ou RStudio) :
``` r
install.packages("knitr")
install.packages("ggplot2")
```
Si vous envisagez d'exporter des documents pdf avec LaTeX, il faudra probablement aussi exécuter (à la racine) :
``` shell
apt-get update ; apt-get install texlive-base
```
### Installer RStudio
RStudio n’est malheureusement pas intégré à Debian. Le plus simple est de télécharger le paquet Debian correspondant sur le [site RStudio](https://www.rstudio.com/products/rstudio/download/#download), puis de l’installer manuellement (vous devrez peut-être adapter le numéro de version) :
``` shell
cd /tmp/
wget https://download1.rstudio.org/rstudio-xenial-1.1.453-amd64.deb
sudo dpkg -i rstudio-xenial-1.1.453-amd64.deb
sudo apt-get update ; sudo apt-get -f install # to fix possibly missing dependencies
```
Mac OSX and Windows
-------------------
- Télécharger et installer R depuis le [site CRAN](https://cran.r-project.org/) en choisissant le bon système d'exploitation.
- Télécharger et installer RStudio depuis le [site RStudio](https://www.rstudio.com/products/rstudio/download/#download) en choisissant le bon système d'exploitation.
- Télécharger et installer MiKTeX depuis le [site MiKTeX](https://miktex.org/download) en choisissant le bon système d'exploitation. Vous serez amené à installer différents packages lors du premier export pdf.
- Ouvrir RStudio et exécuter les commandes suivantes dans la console pour installer `knitr` et `ggplot2`
``` r
install.packages("knitr", dep=TRUE)
install.packages("ggplot2", dep=TRUE)
```
Documentation RStudio
=====================
L’équipe de RStudio a créé différents matériels et tutoriels très bien faits. Nous vous recommandons de consulter les [fiches mémo](https://www.rstudio.com/resources/cheatsheets/). En particulier, vous pourriez être intéressés par celles-ci :
- [RStudio IDE](https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf),
- [R Markdown](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) (here is also a [nice step-by-step presentation of Rmarkdown](https://rmarkdown.rstudio.com/)),
- The [R Markdown Reference guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf),
- [Data visualization with ggplot2](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf),
- [Data transformation with dplyr](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf)
Voici aussi les versions françaises de certains documents mais elles ne sont pas toujours à jour :
- [IDE RStudio](https://github.com/rstudio/cheatsheets/raw/master/translations/french/rstudio-IDE-cheatsheet.pdf)
- [Visualisation de données avec ggplot2](https://github.com/rstudio/cheatsheets/raw/master/translations/french/ggplot2-french-cheatsheet.pdf)
- [Transformation de données avec dplyr](https://github.com/rstudio/cheatsheets/raw/master/translations/french/data-wrangling-french.pdf)
- [Un court document sur R Markdown](https://www.fun-mooc.fr/c4x/UPSUD/42001S02/asset/RMarkdown.pdf)
Utiliser Git avec RStudio
=========================
La première chose à faire est de configurer Git sur votre ordinateur. Pour ce faire, vous pouvez suivre la vidéo [configurer Git pour Gitlab](https://www.fun-mooc.fr/courses/course-v1:inria+41016+session02/jump_to_id/7508aece244548349424dfd61ee3ba85) (en français) et le document [Git et Gitlab](https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/blob/master/module2/ressources/gitlab_fr.org) correspondant (en français).
Vous pourrez alors utiliser Git avec RStudio. Pour ce faire, vous pouvez suivre la vidéo [RStudio - Gitlab](https://www.fun-mooc.fr/courses/course-v1:inria+41016+session02/jump_to_id/d132a854b0464ad29085cedaded23136) (en français) dont les étapes sont reprises ci-dessous.
*(Nous vous signalons aussi cette* [vidéo](https://www.youtube.com/embed/uHYcDQDbMY8) *(en anglais) ainsi que le* [tuto pas à pas](https://swcarpentry.github.io/git-novice/14-supplemental-rstudio/index.html) *(en anglais) de Software Carpentry.)*
Cloner un dépôt
---------------
Ouvrir RStudio et procéder comme suit :
- Créer un nouveau projet sous contrôle de version : `File / New
Project / Version Control`
![](rstudio_images/new_project.png)
![](rstudio_images/git.png)
- Récupérer l'URL du dépôt Gitlab
![](rstudio_images/adresse_depot.png)
- Indiquez cette URL dans le champ "Repository URL" *(vous voudrez* *peut-être préfixer cette URL avec `xxx@` où `xxx` est votre identifiant* *Gitlab pour éviter d'avoir à le ressaisir ultérieurement)*.
![](rstudio_images/clone.png)
- Si vous êtes derrière un proxy, il faut le définir dans Git (voir le paragraphe "Gérer les proxy" de la page sur [Git et Gitlab](https://www.fun-mooc.fr/courses/course-v1:inria+41016+session02/jump_to_id/7508aece244548349424dfd61ee3ba85)).
- Git se connecte à Gitlab et récupère une copie complète du dépôt.
- RStudio redémarre dans un mode lié à Git :
![](rstudio_images/rstudio.png)
- Le gestionnaire de fichiers à droite vous permet de parcourir le dépôt sous contrôle de version.
Modifier un fichier
-------------------
- Ouvrir le fichier `Module2/exo1/toy_document.Rmd` et le modifier.
- Enregistrer.
- Aller dans le menu Git pour effectuer le commit.
![](rstudio_images/commit.png)
![](rstudio_images/commit2.png)
- Sélectionner les lignes à commiter puis cliquer sur `commit`.
![](rstudio_images/commit5.png)
Les modifications ont été commitées uniquement sur la machine. Elles n'ont pas été propagées sur Gitlab.
- Cliquer sur `push` pour les propager sur Gitlab.
![](rstudio_images/push.png)
![](rstudio_images/push2.png)
![](rstudio_images/push3.png)
**NB :** Vous ne pouvez pas propager vos modifications sur GitLab si des modifications ont été faites sur GitLab entre-temps.
![](rstudio_images/push4.png)
- Il faut d’abord récupérer ces modifications distantes sur votre machine locale. Pour ce faire cliquer sur `pull`.
In the MOOC video, I quickly demo how org-mode can be used in various contexts. Here are the (sometimes trimmed) corresponding org-files. These documents depend on many other external data files and are not meant to lead to reproducible documents but it will give you an idea of how it can be organized:
1. [journal.org](journal.org): an excerpt (I've only left a few code samples and links to some resources on R, Stats, ...) from my own journal. This is a personal document where everything (meeting notes, hacking, random thoughts, ...) goes by default. Entries are created with the `C-c c` shortcut.
2. [labbook<sub>single</sub>.org](labbook_single.org): this is an excerpt from the laboratory notebook [Tom Cornebize](https://cornebize.net/) wrote during his Master thesis internship under my supervision. This a personal labbook. I consider this notebook to be excellent and was the ideal level of details for us to communicate without any ambiguity and for him to move forward with confidence.
3. [paper.org](paper.org): this is an ongoing paper based on the previous labbook of Tom Cornebize. As such it is not reproducible as there are hardcoded paths and uncleaned dependencies but writing it from the labbook was super easy as we just had to cut and paste the parts we needed. What may be interesting is the organization and the org tricks to export to the right LaTeX style. As you may notice, in the end of the document, there is a commented section with emacs commands that are automatically executed when opening the file. It is an effective way to depend less on the `.emacs/init.el` which is generally customized by everyone.
4. [labbook<sub>several</sub>.org](labbook_several.org): this is a labbook for a specific project shared by several persons. As a consequence it starts with information about installation, common scripts, has section with notes about all our meetings, a section with information about experiments and an other one about analysis. Entries could have been labeled by who wrote them but there were only a few of us and this information was available in git so we did not bother. In such labbook, it is common to find annotations indicating that such experiment was `:FLAWED:` as it had some issues.
5. [technical<sub>report</sub>.org](technical_report.org): this is a short technical document I wrote after a colleague sent me a PDF describing an experiment he was conducting and asked me about how reproducible I felt it was. It turned out I had to cut and paste the C code from the PDF, then remove all the line numbers and fix syntax, etc. Obviously I got quite different performance results but writing everything in org-mode made it very easy to generate both HTML and PDF and to explicitly explain how the measurements were done.
Here are a few links to other kind of examples:
- Slides: all my slides for a series of lectures is available here: <https://github.com/alegrand/SMPE>. Here is a [typical source](https://raw.githubusercontent.com/alegrand/SMPE/master/lectures/lecture_central_limit_theorem.org) and the [resulting PDF](https://raw.githubusercontent.com/alegrand/SMPE/master/lectures/lecture_central_limit_theorem.pdf)
- Lucas Schnorr, a colleague, maintains:
- a set of templates for various computer science journals/conferences: [IEEE](https://github.com/schnorr/ieeeorg), [Wiley](https://github.com/schnorr/wileyorg), [ACM](https://github.com/schnorr/acmorg), [LNCS](https://github.com/schnorr/llncsorg)
- his lecture on programming languages for undergrads: <https://github.com/schnorr/mlp/tree/master/conteudo>
This diff is collapsed.
This diff is collapsed.
Table of Contents<span class="tag" data-tag-name="TOC"></span>
==============================================================
- ["Thoughts" on language/software stability](#thoughts-on-languagesoftware-stability)
- [Controlling your software environment](#controlling-your-software-environment)
- [Preservation/Archiving](#preservationarchiving)
- [Workflows](#workflows)
- [Numerical and statistical issues](#numerical-and-statistical-issues)
- [Publication practices](#publication-practices)
- [Experimentation](#experimentation)
"Thoughts" on language/software stability
=========================================
As we explained, the programming language used in an analysis has a clear influence on the reproducibility of your analysis. It is not a characteristic of the language itself but rather a consequence of the development philosophy of the underlying community. For example C is a very stable language with a [very clear specification designed by a committee](https://en.wikipedia.org/wiki/C_(programming_language)#ANSI_C_and_ISO_C) (even though some compilers may not respect this norm).
On the other end of the spectrum, [Python](https://en.wikipedia.org/wiki/Python_(programming_language)) had a much more organic development based on a readability philosophy and valuing continuous improvement over backwards-compatibility. Furthermore, Python is commonly used as a wrapping language (e.g., to easily use C or FORTRAN libraries) and has its own packaging system. All these design choices tend to make reproducibility often a bit painful with Python, even though the community is slowly taking this into account. The transition from Python 2 to the not fully backwards compatible Python 3 has been a particularly painful process, not least because the two languages are so similar that is it not always easy to figure out if a given script or module is written in Python 2 or Python 3. It isn't even rare to see Python scripts that work under both Python 2 and Python 3, but produce different results due to the change in the behavior of integer division.
[R](https://en.wikipedia.org/wiki/R_(programming_language)), in comparison is much closer (in terms of developer community) to languages like [SAS](https://en.wikipedia.org/wiki/SAS_(software)), which is heavily used in the pharmaceutical industry where statistical procedures need to be standardized and rock solid/stable. R is obviously not immune to evolutions that break old versions and hinder reproducibility/backward compatibility. Here is a relatively recent [true story about this](http://members.cbio.mines-paristech.fr/~thocking/HOCKING-reproducible-research-with-R.html) and some colleagues who worked on the [statistics introductory course with R on FUN](https://www.fun-mooc.fr/courses/UPSUD/42001S06/session06/about) reported us several issues with a few functions (`plotmeans` from `gplots`, `survfit` from `survival`, or `hclust`) whose default parameters had changed over the years. It is thus probably good practice to give explicit values for all parameters (which can be cumbersome) instead of relying on default values, and to restrict your dependencies as much as possible.
This being said, the R development community is generally quite careful about stability. We (the authors of this MOOC) believe that open source (which allows to inspect how computation is done and to identify both mistakes and sources of non-reproducibility) is more important than the rock solid stability of SAS, which is proprietary software.
Yet, if you really need to stay with SAS, you should know that SAS can be used within Jupyter using the [Python SASPy](https://sassoftware.github.io/saspy/) and the [Python SASKernel](https://sassoftware.github.io/sas_kernel/) packages (step by step explanations about this are given [here](https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/blob/master/documents/tuto_jupyter_windows/tuto_jupyter_windows.md#53-le-package-python-saspy-permet-dex%C3%A9cuter-du-code-sas-dans-un-notebook-python)). Using such literate programming approach allied with systematic version and environment control will always help. Similar solutions exist for many languages ([list of Jupyter kernels](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)).
Controlling your software environment
=====================================
As we mentioned in the video sequences, there are several solutions to control your environment:
- The easy (preserve the mess) ones: [CDE](http://www.pgbovine.net/cde.html) or [ReproZip](https://vida-nyu.github.io/reprozip/)
- The more demanding (encourage cleanliness) where you start with a clean environment and install only what's strictly necessary (and document it):
- The very well known [Docker](https://www.docker.io/)
- [Singularity](https://singularity.lbl.gov/) or [Spack](https://spack.io/), which are more targeted toward the specific needs of high performance computing users
- [Guix](https://www.gnu.org/software/guix/), [Nix](https://nixos.org/) that are very clean (perfect?) solutions to this dependency hell and which we recommend
It may be hard to understand the difference between these different approaches and decide which one is better in your context.
Here is a webinar where some of these tools are demoed in a reproducible research context: [Controling your environment (by Michael Mercier and Cristian Ruiz)](https://github.com/alegrand/RR_webinars/blob/master/2_controling_your_environment/index.org)
You may also want to have a look at [the Popper conventions](http://falsifiable.us/) ([webinar by Ivo Gimenez through google hangout](https://github.com/alegrand/RR_webinars/blob/master/11_popper/index.org)) or at the [presentation of Konrad Hinsen on Active Papers](https://github.com/alegrand/RR_webinars/blob/master/7_publications/index.org) (<http://www.activepapers.org/>).
Preservation/Archiving
======================
Ensuring software is properly archived, i.e, is safely stored so that it can be accessed in a perennial way, can be quite tricky. If you have never seen [Roberto Di Cosmo presenting the Software Heritage project](https://github.com/alegrand/RR_webinars/blob/master/5_archiving_software_and_data/index.org), this is a must see. [<https://www.softwareheritage.org/>](https://www.softwareheritage.org/)
For regular data, we highly recommend using [<https://www.zenodo.org/>](https://www.zenodo.org/) whenever the data is not sensitive.
Workflows
=========
In the video sequences, we mentioned workflow managers (original application domain in parenthesis):
- [Galaxy](https://galaxyproject.org/) (genomics), [Kepler](https://kepler-project.org/) (ecology), [Taverna](https://taverna.apache.org/) (bio-informatics), [Pegasus](https://pegasus.isi.edu/) (astronomy), [Collective Knowledge](http://cknowledge.org/) (compiling optimization), [VisTrails](https://www.vistrails.org) (image processing)
- Light-weight: [dask](http://dask.pydata.org/) (python), [drake](https://ropensci.github.io/drake/) (R), [swift](http://swift-lang.org/) (molecular biology), [snakemake](https://snakemake.readthedocs.io/) (like `make` but more expressive and in `python`)...
- Hybrids: [SOS-notebook](https://vatlab.github.io/sos-docs/)...
You may want to have a look at this webinar: \[\[<https://github.com/alegrand/RR_webinars/blob/master/6_reproducibility_bioinformatics/index.org>\]\[Reproducible Science in Bio-informatics: Current Status, Solutions and Research Opportunities (by Sarah Cohen Boulakia, Yvan Le Bras and Jérôme Chopard).\]\]
Numerical and statistical issues
================================
We have mentioned these topics in our MOOC but we could by no way cover them properly. We only suggest here a few interesting talks about this.
- \[\[<https://github.com/alegrand/RR_webinars/blob/master/10_statistics_and_replication_in_HCI/index.org>\]\[In this talk, Pierre Dragicevic provides a nice illustration of the consequences of statistical uncertainty and of how some concepts (e.G. p-values) are commonly badly understood.\]\]
- \[\[<https://github.com/alegrand/RR_webinars/blob/master/3_numerical_reproducibility/index.org>\]\[Nathalie Revol, Philippe Langlois and Stef Graillat present the main challenges encountered when trying to achieve numerical reproducibility and present recent research work on this topic.\]\]
Publication practices
=====================
You may want to have a look at the following two webinars:
- [Enabling open and reproducible research at computer systems’ conferences (by Grigori Fursin)](https://github.com/alegrand/RR_webinars/blob/master/8_artifact_evaluation/index.org). In particular, this talk discusses *artifact evaluation* that is becoming more and more popular.
- [Publication Modes Favoring Reproducible Research (by Konrad Hinsen and Nicolas Rougier)](https://github.com/alegrand/RR_webinars/blob/master/7_publications/index.org). In this talk, the motivation for the [ReScience journal](http://rescience.github.io/) initiative are presented.
- [Simine Vazire - When Should We be Skeptical of Scientific Claims?](https://www.youtube.com/watch?v=HuJ2G8rXHMs), which is discussing publication practices in social sciences and in particular HARKing (Hypothesizing After the Results are Known), p-hacking, etc.
Experimentation
===============
Experimentation was not covered in this MOOC, although it is an essential part of science. The main reason is that practices and constraints can vary so wildly from one domain to another that it could not be properly covered in a first edition. We would be happy to gather references you consider as interesting in your domain so do not hesitate to provide us with such references by using the forum and we will update this page.
- [A recent talk by Lucas Nussbaum on Experimental Testbeds in Computer Science](https://github.com/alegrand/RR_webinars/blob/master/9_experimental_testbeds/index.org).
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment