diff --git a/module4/ressources/resources.html b/module4/ressources/resources.html index 7a20bb9b409e23159e925bf71c3cd9ccc8a1772f..329189baea71204206ab1afa2ec26f7eda6a741f 100644 --- a/module4/ressources/resources.html +++ b/module4/ressources/resources.html @@ -3,25 +3,33 @@
+As we explained, the programming language used in an analysis has a +clear influence on the reproducibility of your analysis. It is not a +characteristic of the language itself but rather a consequence of the +development philosophy of the underlying community. For example C is a +very stable language with a very clear specification designed by a +committee (even though some compilers may not respect this norm). +
+ ++On the other end of the spectrum, Python had a much more organic +development based on a readability philosophy and has evolved with +time. Furthermore, python is commonly used as a wrapping language +(e.g., to easily use C or FORTRAN libraries) and has its own packaging +system to make everyone's life easier. All these design choices tend +to make reproducibility often a bit painful with python, even though +the community is slowly taking this into account. +
+ +
+R, in comparison is much closer (in terms of developer community) to
+languages like SAS, which is heavily used in the pharmaceutical
+industry where statistical procedures need to be standardized and rock
+solid/stable. R is obviously not immune to evolutions that break old
+versions and hinder reproducibility/backward compatibility. Here is a
+relatively recent true story about this and some colleagues who worked
+on the statistics introductory course with R on FUN reported us
+several issues with functions from a few functions (plotmeans
from
+gplots
, survfit
from survival
, or hclust
) whose default
+parameters had changed over the years. It is thus probably a good
+practice to explicitly indicate in your code default values (, which
+can be cumbersome) and to restrict your dependencies as much as
+possible.
+
+This being said, the R development community is generally quite +careful about stability. We (the authors of this MOOC) think open +source (, which allows to inspect how computation is done and to +identify both mistakes and sources of non reproducibility) is more +important than the rock solid stability of SAS, which is a proprietary +software. Yet, if you really need to stay with SAS (similar solutions +probably exist for other languages as well), you should know that SAS +can be used within Jupyter using either the Python SASKernel or the +Python SASPy package (step by step explanations about this are given +here). Using such literate programming approach allied with systematic +control version and environment control will help anyway. +
++As we mentioned in the video sequences, there are several solutions to +control your environment: +
++It may be hard to understand the difference between these different +approaches and decide which one is better in your context. +
+ ++Here is a webinar where some of these tools are demoed in a +reproducible research context: Controling your environment (by Michael +Mercier and Cristian Ruiz) +
+ ++You may also want to have a look at the Popper conventions (webinar by +Ivo Gimenez through google hangout) or at the presentation of Konrad +Hinsen on Active Papers (http://www.activepapers.org/). +
++Ensuring software is properly archived, i.e, is safely stored so that +it can be accessed in a perennial way, can be quite tricky. If you +have never seen Roberto Di Cosmo presenting the Software Heritage +project, this is a must see. https://www.softwareheritage.org/ +
+ ++For regular data, we highly recommend using https://www.zenodo.org/ +whenever data is not sensitive. +
++In the video sequences, we mentioned workflows (original domain in parenthesis): +
+make
but more expressive and in python
) …+You may want to have a look at this webinar: Reproducible Science in +Bio-informatics: Current Status, Solutions and Research Opportunities +(by Sarah Cohen Boulakia, Yvan Le Bras and Jérôme Chopard). +
++These topics could only be mentioned in our MOOC but could by no way +be properly covered. We only suggest here a few interesting talks +about this. +
++You may want to have a look at the following two webinars: +
++Experimentation was not covered in this MOOC whereas it is an +essential part of science. The main reason is that practices and +constraints can vary so wildly from a domain to an other that it could +not be properly covered in a first edition. We would be happy to +gather references you consider as interesting in your domain so do not +hesitate to provide us with such references by using the forum and we +will update this page. +
+ + +When taking notes, it may be difficult to remember which version of the code or of a file was used. This is what version control is useful @@ -123,13 +327,13 @@ is the price to pay for running git from within the notebook itself.
This topic is discussed on StackOverflow. When using pip
(the Python
package installer) within a shell command, it is easy to query the
@@ -237,9 +441,9 @@ Requires: patsy, pandas
Without resorting to pip (that will list all available packages), you may want to know which modules are loaded in a Python session as well @@ -300,9 +504,9 @@ zlib 1.0
The easiest way to go is as follows:
@@ -319,9 +523,9 @@ dynamic libraries that are wrapped by Python though.The Jupyter environment we deployed on our servers for the MOOC is based on the version 4.5.4 of Miniconda and Python 3.6. In this @@ -388,13 +592,13 @@ It is even possible to install a specific (possibly much older) version, e.g.,:
The best way seems to be to rely on the
Finally, it is good to know that there is a built-in R command
(
This section is mostly a cut and paste from the recent post by Ian
Pylvainen on this topic. It comprises a very clear explanation on how
@@ -729,9 +933,9 @@ to proceed.
If you're on a Debian or a Ubuntu system, it may be difficult to
access a specific version without breaking your system. So unless you
@@ -754,10 +958,9 @@ install.packages(packageurl, repos=
-devtools
package (if this
package is not installed, you should install it first by running in R
@@ -462,9 +666,9 @@ clean R dependency management should thus have a look at
-Getting the list of installed packages and their version
-Getting the list of installed packages and their version
+installed.packages
) allowing to retrieve and list the details of all
@@ -719,9 +923,9 @@ packages installed.
Installing a new package or a specific version
-Installing a new package or a specific version
+Installing a pre-compiled version
-
+
+Using devtools
-
-Alternatively, you may want to install an older package from source
-
-