Traduction anglaise exo 1 module 4

a21a2b7b · Konrad Hinsen · 1d66becc · a21a2b7b · a21a2b7b
Commit a21a2b7b authored Oct 04, 2018 by Konrad Hinsen
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 184 additions and 33 deletions

exo1.html module4/ressources/exo1.html +122 -33

exo1.org module4/ressources/exo1.org +62 -0

No files found.
--- a/module4/ressources/exo1.html
+++ b/module4/ressources/exo1.html
--- a/module4/ressources/exo1.org
+++ b/module4/ressources/exo1.org
@@ -93,3 +93,65 @@ ce tableau.
 Nous effectuerons une synthèse illustrant les principales divergences
 observées et nous vous l'enverrons à la fin du MOOC.

+* Exercice 1: Re-execution is not replication...
+Unfortunately terminology varies a lot between authors and
+communities, but it is important to understand the distinction between
+different levels of "replication". You can be satisfied with
+re-running the code and get exactly the same results, but you can also
+try to obtain similar results using a similar approach, changing for
+example the programming language, computational method, etc. An
+article we recommend on this topic is
+[[https://arxiv.org/abs/1708.08205]].
+
+Often the devil is in the details that one would have never thought
+about, and we have had our share of surprises while preparing this
+MOOC, in particular with the exercise on the Challenger catastrophe
+from module 2. We therefore propose in this exercise that you re-do a
+part of this analysis, following the example of Siddhartha Dallal and
+co-authors almost 30 years ago in their article /Risk Analysis of the
+Space Shuttle: Pre-Challenger Prediction of Failure/, published in the
+/Journal of the American Statistical Association/ (Vol. 84, No. 408,
+Déc., 1989), but using a different language of your choosing (Python,
+R, Julia, SAS...).
+
+Our experience shows that the estimations of slope and intercept are
+generally the same, but there can be differences when looking at
+variance estimators and R^2 in more detail. Another source of
+surprises is the final graphical presentation, depending on the
+versions of the libraries that are used.
+
+The computations to be done are described at
+[[https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/][https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/]]
+together with instructions for contributing.
+
+You will find there our replications of the computations by Dallal /et
+al./ (in R), one in Python and one in R (very similar to what you have
+used in module 2). This exercise can be done at two levels:
+
+1. an easy level at which you start from the code in the language that you did not use initially, and content yourself with re-executin it. This doesn't require mastering logistic regression, it is sufficien to inspect the outputs produced and check that they correspond to the expected values. For those who want to re-execute the Python notebook in our MOOC's Jupyter environment, check [[https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/4ab5bb42ca1e45c8b0f349751b96d405][the resources for sequence 4A of module 2]] that explain how to import a notebook.
+2. a more difficult level at which you rewrite the analysis completely, possibly in a different language than Python or R, which makes the exercise more interesting because we have not tested such variants. If logistic regression is not already implemented for your language, you will need a good understanding of it in order to write the code yourself, which of course makes the exercise even more instructive.
+
+You can discuss your successes or failures on the forum, after following these instructions:
+- *First, publish your notebooks in your repository*, taking care to enrich your document with information about your system and your libraries (version numbers etc.).
+- Indicate your result by adding to this [[https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md][table]] (you have write permissions, so you can simply edit it via the GitLab interface). Check the values obtained for:
+  1) the slope and intercept coefficients
+  2) the error estimates for these coefficients
+  3) the goodness of fit
+  4) the plot
+  5) the confidence region
+- For each of these values, specify if your result is
+  - identical
+  - close, to three decimal places
+  - very different
+  - non functional (no result obtained)
+  Also provide in this table:
+  - a link to your GitLab workspace with your notebook(s)
+  - your operating system
+  - the language you used, with the version number
+  - version numbers for the main libraries
+    - Python: numpy, pandas, matplotlib, statsmodels...
+    - R: BLAS, ggplot, dplyr if used
+
+Don't worry if these instructions seem confusing, they are reproduced above the [[https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md][table]] and you will quickly notice if something is missing when you try to add your data.
+
+We will compile a synthesis of the principal divergences observes and make it available at the end of the MOOC.