thoughthecommunityisslowlytakingthisintoaccount.ThetransitionfromPython2tothenotfullybackwardscompatiblePython3hasbeenaparticularlypainfulprocess,notleastbecausethetwolanguagesaresosimilarthatisitnotalwayseasytofigureoutifagivenscriptormoduleiswritteninPython2orPython3.Itisn't even rare to see Python scripts that work under both Python 2 and Python 3, but produce different results due to the change in the behavior of integer division.
[[https://en.wikipedia.org/wiki/R_(programming_language)][R]], in comparison is much closer (in terms of developer community) to
languages like [[https://en.wikipedia.org/wiki/SAS_(software)][SAS]], which is heavily used in the pharmaceutical
...
...
@@ -30,24 +30,24 @@ solid/stable. R is obviously not immune to evolutions that break old
versions and hinder reproducibility/backward compatibility. Here is a
relatively recent [[http://members.cbio.mines-paristech.fr/~thocking/HOCKING-reproducible-research-with-R.html][true story about this]] and some colleagues who worked
on the [[https://www.fun-mooc.fr/courses/UPSUD/42001S06/session06/about][statistics introductory course with R on FUN]] reported us
several issues with functions from a few functions (=plotmeans= from
=gplots=, =survfit= from =survival=, or =hclust=) whose default
parameters had changed over the years. It is thus probably a good
practice to explicitly indicate in your code default values (, which
can be cumbersome) and to restrict your dependencies as much as
possible.
several issues with a few functions (=plotmeans= from =gplots=,
=survfit= from =survival=, or =hclust=) whose default parameters had
changed over the years. It is thus probably good practice to give
explicit values for all parameters (which can be cumbersome) instead
of relying on default values, and to restrict your dependencies as much
as possible.
This being said, the R development community is generally quite
careful about stability. We (the authors of this MOOC) think open
source (, which allows to inspect how computation is done and to
identify both mistakes and sources of nonreproducibility) is more
important than the rock solid stability of SAS, which is a proprietary
careful about stability. We (the authors of this MOOC) believe that open
source (which allows to inspect how computation is done and to
identify both mistakes and sources of non-reproducibility) is more
important than the rock solid stability of SAS, which is proprietary
software. Yet, if you really need to stay with SAS (similar solutions
probably exist for other languages as well), you should know that SAS
can be used within Jupyter using either the [[https://sassoftware.github.io/sas_kernel/][Python SASKernel]] or the
[[https://sassoftware.github.io/saspy/][Python SASPy]] package (step by step explanations about this are given
[[https://app-learninglab.inria.fr/gitlab/85bc36e0a8096c618fbd5993d1cca191/mooc-rr/blob/master/documents/tuto_jupyter_windows/tuto_jupyter_windows.md][here]]). Using such literate programming approach allied with systematic
control version and environment control will help anyway.
version and environment control will always help.
** Controlling your software environment
As we mentioned in the video sequences, there are several solutions to
control your environment:
...
...
@@ -55,8 +55,8 @@ control your environment:
- The more demanding (encourage cleanliness) where you start with a
clean environment and install only what'sstrictlynecessary(anddocumentit):