Commit b0237e6c authored by brospars's avatar brospars

CI : Export org files to html and move to public

parent 14090da4
...@@ -5,11 +5,11 @@ build: ...@@ -5,11 +5,11 @@ build:
image: binarin/org-export image: binarin/org-export
stage: build stage: build
script: script:
- pwd - for file in $(find -name "*.org"); do emacs --batch --load /emacs/export.el --file $file --eval '(org-html-export-to-html)' &> /dev/null || echo "Exported $file"; done
- ls -al - find -name "*.html" -exec mv -t public {} +
# artifacts: artifacts:
# expire_in: 1 day expire_in: 1 day
# paths: paths:
# - build/*.zip - public/*.html
only: only:
- ci - ci
<div id="content">
<h1 class="title">Introduction à Markdown</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org99b4a98">Syntaxe</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org28d66f2">Headers</a></li>
<li style="margin-bottom:0;"><a href="#orgc27acce">Emphasis</a></li>
<li style="margin-bottom:0;"><a href="#org50640c4">Lists</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org11d3314">Unordered</a></li>
<li style="margin-bottom:0;"><a href="#org0453134">Ordered</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#orgda3d80b">Images</a></li>
<li style="margin-bottom:0;"><a href="#org7cb67aa">Links</a></li>
<li style="margin-bottom:0;"><a href="#orge09365f">Blockquotes</a></li>
<li style="margin-bottom:0;"><a href="#orgfa4914a">Inline code</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org8737c71">Écrire des Maths</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org6dc05c1">Lettres grecques</a></li>
<li style="margin-bottom:0;"><a href="#org2d8310d">Fonctions et opérateurs</a></li>
<li style="margin-bottom:0;"><a href="#org69729f8">Exposants et indices</a></li>
<li style="margin-bottom:0;"><a href="#org39a711a">Fractions, coefficients binomiaux, racines, &#x2026;</a></li>
<li style="margin-bottom:0;"><a href="#org4b11fa4">Sommes et intégrales</a></li>
<li style="margin-bottom:0;"><a href="#org55cc222">Déguisements</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org5bb515f">Autour de <code>markdown</code></a></li>
</ul>
</div>
</div>
<p>
Voici un aperçu rapide de la syntaxe Markdown repris d'une
<a href="https://guides.github.com/features/mastering-markdown/">présentation de Github</a> ainsi que de celles d'<a href="http://csrgxtu.github.io/2015/03/20/Writing-Mathematic-Fomulars-in-Markdown/">Archer Reilly</a>.
</p>
<div id="outline-container-org99b4a98" class="outline-2">
<h2 id="org99b4a98">Syntaxe</h2>
<div class="outline-text-2" id="text-org99b4a98">
</div>
<div id="outline-container-org28d66f2" class="outline-3">
<h3 id="org28d66f2">Headers</h3>
<div class="outline-text-3" id="text-org28d66f2">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
# This is an &lt;h1&gt; tag
## This is an &lt;h2&gt; tag
###### This is an &lt;h6&gt; tag
</pre>
</div>
</div>
<div id="outline-container-orgc27acce" class="outline-3">
<h3 id="orgc27acce">Emphasis</h3>
<div class="outline-text-3" id="text-orgc27acce">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
*This text will be italic*
_This will also be italic_
**This text will be bold**
__This will also be bold__
_You **can** combine them_
</pre>
</div>
</div>
<div id="outline-container-org50640c4" class="outline-3">
<h3 id="org50640c4">Lists</h3>
<div class="outline-text-3" id="text-org50640c4">
</div>
<div id="outline-container-org11d3314" class="outline-4">
<h4 id="org11d3314">Unordered</h4>
<div class="outline-text-4" id="text-org11d3314">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
- Item 1
- Item 2
- Item 2a
- Item 2b
</pre>
</div>
</div>
<div id="outline-container-org0453134" class="outline-4">
<h4 id="org0453134">Ordered</h4>
<div class="outline-text-4" id="text-org0453134">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
1. Item 1
2. Item 2
3. Item 3
1. Item 3a
2. Item 3b
</pre>
</div>
</div>
</div>
<div id="outline-container-orgda3d80b" class="outline-3">
<h3 id="orgda3d80b">Images</h3>
<div class="outline-text-3" id="text-orgda3d80b">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
![GitHub Logo](/images/logo.png)
Format: ![Alt Text](url)
</pre>
</div>
</div>
<div id="outline-container-org7cb67aa" class="outline-3">
<h3 id="org7cb67aa">Links</h3>
<div class="outline-text-3" id="text-org7cb67aa">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
http://github.com - automatic!
[GitHub](http://github.com)
</pre>
</div>
</div>
<div id="outline-container-orge09365f" class="outline-3">
<h3 id="orge09365f">Blockquotes</h3>
<div class="outline-text-3" id="text-orge09365f">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
As Kanye West said:
&gt; We're living the future so
&gt; the present is our past.
</pre>
</div>
</div>
<div id="outline-container-orgfa4914a" class="outline-3">
<h3 id="orgfa4914a">Inline code</h3>
<div class="outline-text-3" id="text-orgfa4914a">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
To print some text with python, you should use the `print()` function.
```
print("Hello world!")
```
</pre>
</div>
</div>
</div>
<div id="outline-container-org8737c71" class="outline-2">
<h2 id="org8737c71">Écrire des Maths</h2>
<div class="outline-text-2" id="text-org8737c71">
<p>
Il est possible d'écrire des formules en Markdown, soit en mode <b>inline</b>
soit en mode <b>displayed formulas</b>. Dans le premier cas, les formules
sont inclues directement à l'intérieur du paragraphe courant alors
que dans le second, elles apparaissent centrées et mises en exergue.
</p>
<p>
Le formatage de la formule est légèrement différent dans les deux cas
car pour qu'une formule s'affiche joliment sur une seule ligne, il
faut la "tasser" un peu plus que lorsqu'elle est mise en valeur.
</p>
<p>
Pour écrire une formule en mode <b>inline</b>, il faut la délimiter par un <code>$</code>
(du coup, pour écrire le symbole dollar, il faut le préfixer par un
backslash, comme ceci: <code>\$</code>) alors que pour écrire en mode <b>displayed</b>, il
faut la délimiter par un <code>$$</code>. Un petit exemple valant mieux qu'un long
discours, voici concrètement comment cela fonctionne:
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
Cette expression $\sum_{i=1}^n X_i$ est inlinée.
</pre>
<p>
Cette expression \(\sum_{i=1}^n X_i\) est inlinée.
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
Cette expression est mise en valeur:
$$\sum_{i=1}^n X_i$$
</pre>
<p>
Cette expression est mise en valeur:
</p>
<p>
\[\sum_{i=1}^n X_i\]
</p>
<p>
Nous vous présentons par la suite une sélection de symboles et de
commandes courantes. En fait, à peu près tout ce qui est classique
dans le langage LaTeX peut être utilisé pourvu que vous délimitiez
bien avec un <code>$</code>. Pour d'autres exemples plus complets jetez un coup
d'œil à ces <a href="http://www.statpower.net/Content/310/R%20Stuff/SampleMarkdown.html">exemples de James H. Steiger</a>.
</p>
</div>
<div id="outline-container-org6dc05c1" class="outline-3">
<h3 id="org6dc05c1">Lettres grecques</h3>
<div class="outline-text-3" id="text-org6dc05c1">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Symbole</th>
<th scope="col" class="org-left">Commande</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">\(\alpha\)</td>
<td class="org-left"><code>$\alpha$</code></td>
</tr>
<tr>
<td class="org-left">\(\beta\)</td>
<td class="org-left"><code>$\beta$</code></td>
</tr>
<tr>
<td class="org-left">\(\gamma\)</td>
<td class="org-left"><code>$\gamma$</code></td>
</tr>
<tr>
<td class="org-left">\(\Gamma\)</td>
<td class="org-left"><code>$\Gamma$</code></td>
</tr>
<tr>
<td class="org-left">\(\pi\)</td>
<td class="org-left"><code>$\pi$</code></td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-org2d8310d" class="outline-3">
<h3 id="org2d8310d">Fonctions et opérateurs</h3>
<div class="outline-text-3" id="text-org2d8310d">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Symbole</th>
<th scope="col" class="org-left">Commande</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">\(\cos\)</td>
<td class="org-left"><code>$\cos$</code></td>
</tr>
<tr>
<td class="org-left">\(\sin\)</td>
<td class="org-left"><code>$\sin$</code></td>
</tr>
<tr>
<td class="org-left">\(\lim\)</td>
<td class="org-left"><code>$\lim$</code></td>
</tr>
<tr>
<td class="org-left">\(\exp\)</td>
<td class="org-left"><code>$\exp$</code></td>
</tr>
<tr>
<td class="org-left">\(\to\)</td>
<td class="org-left"><code>$\to$</code></td>
</tr>
<tr>
<td class="org-left">\(\in\)</td>
<td class="org-left"><code>$\in$</code></td>
</tr>
<tr>
<td class="org-left">\(\forall\)</td>
<td class="org-left"><code>$\forall$</code></td>
</tr>
<tr>
<td class="org-left">\(\exists\)</td>
<td class="org-left"><code>$\exists$</code></td>
</tr>
<tr>
<td class="org-left">\(\equiv\)</td>
<td class="org-left"><code>$\equiv$</code></td>
</tr>
<tr>
<td class="org-left">\(\sim\)</td>
<td class="org-left"><code>$\sim$</code></td>
</tr>
<tr>
<td class="org-left">\(\approx\)</td>
<td class="org-left"><code>$\approx$</code></td>
</tr>
<tr>
<td class="org-left">\(\times\)</td>
<td class="org-left"><code>$\times$</code></td>
</tr>
<tr>
<td class="org-left">\(\le\)</td>
<td class="org-left"><code>$\le$</code></td>
</tr>
<tr>
<td class="org-left">\(\ge\)</td>
<td class="org-left"><code>$\ge$</code></td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-org69729f8" class="outline-3">
<h3 id="org69729f8">Exposants et indices</h3>
<div class="outline-text-3" id="text-org69729f8">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Symbole</th>
<th scope="col" class="org-left">Commande</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">\(k_{n+1}\)</td>
<td class="org-left"><code>$k_{n+1}$</code></td>
</tr>
<tr>
<td class="org-left">\(n^2\)</td>
<td class="org-left"><code>$n^2$</code></td>
</tr>
<tr>
<td class="org-left">\(k_n^2\)</td>
<td class="org-left"><code>$k_n^2$</code></td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-org39a711a" class="outline-3">
<h3 id="org39a711a">Fractions, coefficients binomiaux, racines, &#x2026;</h3>
<div class="outline-text-3" id="text-org39a711a">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Symbole</th>
<th scope="col" class="org-left">Commande</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">\(\frac{4z^3}{16}\)</td>
<td class="org-left"><code>$\frac{4z^3}{16}$</code></td>
</tr>
<tr>
<td class="org-left">\(\frac{n!}{k!(n-k)!}\)</td>
<td class="org-left"><code>$\frac{n!}{k!(n-k)!}$</code></td>
</tr>
<tr>
<td class="org-left">\(\binom{n}{k}\)</td>
<td class="org-left"><code>$\binom{n}{k}$</code></td>
</tr>
<tr>
<td class="org-left">\(\frac{\frac{x}{1}}{x - y}\)</td>
<td class="org-left"><code>$\frac{\frac{x}{1}}{x - y}$</code></td>
</tr>
<tr>
<td class="org-left">\(^3/_7\)</td>
<td class="org-left"><code>$^3/_7$</code></td>
</tr>
<tr>
<td class="org-left">\(\sqrt{k}\)</td>
<td class="org-left"><code>$\sqrt{k}$</code></td>
</tr>
<tr>
<td class="org-left">\(\sqrt[n]{k}\)</td>
<td class="org-left"><code>$\sqrt[n]{k}$</code></td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-org4b11fa4" class="outline-3">
<h3 id="org4b11fa4">Sommes et intégrales</h3>
<div class="outline-text-3" id="text-org4b11fa4">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Symbole</th>
<th scope="col" class="org-left">Commande</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">\(\sum_{i=1}^{10} t_i\)</td>
<td class="org-left"><code>$\sum_{i=1}^{10} t_i$</code></td>
</tr>
<tr>
<td class="org-left">\(\int_0^\infty \mathrm{e}^{-x}\,\mathrm{d}x\)</td>
<td class="org-left"><code>$\int_0^\infty \mathrm{e}^{-x}\,\mathrm{d}x$</code></td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-org55cc222" class="outline-3">
<h3 id="org55cc222">Déguisements</h3>
<div class="outline-text-3" id="text-org55cc222">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Symbole</th>
<th scope="col" class="org-left">Commande</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">\(\hat{a}\)</td>
<td class="org-left"><code>$\hat{a}$</code></td>
</tr>
<tr>
<td class="org-left">\(\bar{a}\)</td>
<td class="org-left"><code>$\bar{a}$</code></td>
</tr>
<tr>
<td class="org-left">\(\dot{a}\)</td>
<td class="org-left"><code>$\dot{a}$</code></td>
</tr>
<tr>
<td class="org-left">\(\ddot{a}\)</td>
<td class="org-left"><code>$\ddot{a}$</code></td>
</tr>
<tr>
<td class="org-left">\(\overrightarrow{AB}\)</td>
<td class="org-left"><code>$\overrightarrow{AB}$</code></td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="outline-container-org5bb515f" class="outline-2">
<h2 id="org5bb515f">Autour de <code>markdown</code></h2>
<div class="outline-text-2" id="text-org5bb515f">
<p>
Tout d'abord, pour aller plus loin avec <code>markdown</code> et ses extensions / ramifications :
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">Le didacticiel « <a href="https://enacit1.epfl.ch/markdown-pandoc/">Élaboration et conversion de documents avec Markdown et Pandoc</a> » de Jean-Daniel Bonjour (EPFL), précis, complet, concis, en français ; un vrai bonheur !</li>
<li style="margin-bottom:0;">L'article <a href="https://en.wikipedia.org/wiki/Markdown#Example">Markdown</a> de wikipedia en anglais contient un bon pense-bête sur la syntaxe <code>markdown</code>.</li>
<li style="margin-bottom:0;">Github propose un court et efficace didacticiel (en anglais) : <a href="https://guides.github.com/features/mastering-markdown/">Mastering Markdown</a>.</li>
</ul>
<p>
Comme nous l'illustrons dans la « film d'écran » (<i>screencast</i>), l'éditeur de texte des dépôts <code>github</code> et <code>gitlab</code> permet d'interpréter / transformer à la demande un fichier <code>mardown</code> en un fichier <code>html</code>. C'est à la fois agréable et pratique, mais ce n'est pas une solution pour une utilisation quotidienne de <code>markdown</code>, pour cela, il est plus efficace d'éditer son texte, avec un éditeur de texte, sur son ordinateur, avant de « l'exporter » dans un format comme <code>html</code>, <code>pdf</code>, <code>docx</code>, <code>epub</code>, etc. Il existe des éditeurs plus ou moins spécialisés pour <code>markdown</code>, certains sont indiqués sur la page <a href="https://github.com/jgm/pandoc/wiki/Pandoc-Extras#editors">Editors</a> du site de <code>pandoc</code>, mais nous préconisons clairement l'emploi d'un éditeur de texte « généraliste » capable de reconnaître la syntaxe <code>markdown</code>. Nous en avons indiqué en début de séquence et on pourra trouver des informations complémentaires dans la section <a href="https://enacit1.epfl.ch/markdown-pandoc/#editeurs_markdown">Quelques éditeurs adaptés à l'édition Markdown</a> du didacticiel de Jean-Daniel Bonjour.
</p>
<p>
Pour convertir un fichier <code>markdown</code> en un format « arbitraire », la solution à ce jour la plus complète est <a href="http://pandoc.org/">Pandoc</a>, logiciel développé par John MacFarlane, un philosophe de Berkeley (le site <a href="https://github.com/jgm/pandoc">github</a>). En plus du site de <code>Pandoc</code>, le didacticiel de J.-D. Bonjour donne de nombreuses explications sur comment installer et utiliser <code>pandoc</code> dans la section <a href="https://enacit1.epfl.ch/markdown-pandoc/#commande_pandoc">Utilisation du convertisseur Pandoc</a>. Comme <code>pandoc</code> &#x2013; écrit en Haskell &#x2013; peut être parfois un peu difficile à installer, nous indiquons maintenant quelques solutions alternatives :
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">Des sites comme <a href="http://www.markdowntopdf.com/">http://www.markdowntopdf.com/</a> et <a href="http://markdown2pdf.com/">http://markdown2pdf.com/</a> permettent de convertir en ligne un fichier <code>markdown</code> en un fichier <code>pdf</code>.</li>
<li style="margin-bottom:0;">Le projet <a href="http://commonmark.org/">CommonMark</a> propose, en plus d'une spécifications plus rigoureuse de la syntaxe <code>markdown</code>, des convertisseurs <code>markdown</code><code>html</code> / <code>LaTeX</code> (et plus) écris en <code>C</code> et en <code>JavaScript</code> (<a href="https://github.com/CommonMark/CommonMark">https://github.com/CommonMark/CommonMark</a>).</li>
<li style="margin-bottom:0;">Le site de <a href="https://daringfireball.net/projects/markdown/">John Gruber</a>, le créateur de <code>markdown</code>, fournit un convertisseur <code>markdown</code><code>html</code> écrit en <code>perl</code>.</li>
<li style="margin-bottom:0;"><a href="http://fletcherpenney.net/multimarkdown/">MultiMarkdown</a> est une autre extension de <code>markdown</code> qui vient avec son convertisseur <code>markdown</code><code>html</code> écrit en <code>C</code>.</li>
<li style="margin-bottom:0;"><a href="https://github.com/joeyespo/grip">grip</a> est un serveur écrit en <code>python</code> qui permet de convertir et visualiser à la volée des fichiers <code>markdown</code> avec son navigateur (très utile pour éviter d'avoir à faire des « commits » en grande quantité lorsqu'on écrit de tels fichiers pour un dépôt <code>github</code> ou <code>gitlab</code>).</li>
</ul>
<p>
La conversion en <code>pdf</code> passe toujours par <a href="https://fr.wikipedia.org/wiki/LaTeX">LaTeX</a> ce qui nécessite d'avoir une version complète et à jour de ce logiciel sur sa machine.
</p>
<p>
Dans la petite démonstration, nous montrons comment générer un fichier <code>docx</code> à partir d'un fichier <code>md</code> avec <code>Pandoc</code> et nous soulignons qu'il est alors possible d'utiliser un traitement de texte comme <code>LibreOffice</code> pour modifier le fichier obtenu. Il est clair que si des modifications sont apportées au <code>docx</code> elle en seront pas (automatiquement) propagées au <code>md</code>. Il faudra utiliser <code>Pandoc</code> pour cela et effectuer une conversion de <code>docx</code> vers <code>md</code> (et seules les éléments du format <code>docx</code> qui existent en <code>md</code> seront conservés).
</p>
<p>
Une stratégie qui est souvent employée et qui fonctionne bien en pratique consiste à faire le gros du travail de rédaction d'un article ou d'un mémoire en <code>Markdown</code>. La rédaction terminée, le fichier est exporté au format <code>docx</code> (ou <code>LaTeX</code>) et des ajustements de mise en page sont alors effectués avec un logiciel de traitement de texte (ou un éditeur <code>LaTeX</code>).
</p>
</div>
</div>
</div>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
<!-- 2018-09-05 mer. 07:41 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Analyse des mots-clés de mon journal</title>
<meta name="generator" content="Org mode" />
<meta name="author" content="Arnaud Legrand" />
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
.title { text-align: center;
margin-bottom: .2em; }
.subtitle { text-align: center;
font-size: medium;
font-weight: bold;
margin-top:0; }
.todo { font-family: monospace; color: red; }
.done { font-family: monospace; color: green; }
.priority { font-family: monospace; color: orange; }
.tag { background-color: #eee; font-family: monospace;
padding: 2px; font-size: 80%; font-weight: normal; }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.org-right { margin-left: auto; margin-right: 0px; text-align: right; }
.org-left { margin-left: 0px; margin-right: auto; text-align: left; }
.org-center { margin-left: auto; margin-right: auto; text-align: center; }
.underline { text-decoration: underline; }
#postamble p, #preamble p { font-size: 90%; margin: .2em; }
p.verse { margin-left: 3%; }
pre {
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;
}
pre.src {
position: relative;
overflow: visible;
padding-top: 1.2em;
}
pre.src:before {
display: none;
position: absolute;
background-color: white;
top: -10px;
right: 10px;
padding: 3px;
border: 1px solid black;
}
pre.src:hover:before { display: inline;}
/* Languages per Org manual */
pre.src-asymptote:before { content: 'Asymptote'; }
pre.src-awk:before { content: 'Awk'; }
pre.src-C:before { content: 'C'; }
/* pre.src-C++ doesn't work in CSS */
pre.src-clojure:before { content: 'Clojure'; }
pre.src-css:before { content: 'CSS'; }
pre.src-D:before { content: 'D'; }
pre.src-ditaa:before { content: 'ditaa'; }
pre.src-dot:before { content: 'Graphviz'; }
pre.src-calc:before { content: 'Emacs Calc'; }
pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
pre.src-fortran:before { content: 'Fortran'; }
pre.src-gnuplot:before { content: 'gnuplot'; }
pre.src-haskell:before { content: 'Haskell'; }
pre.src-hledger:before { content: 'hledger'; }
pre.src-java:before { content: 'Java'; }
pre.src-js:before { content: 'Javascript'; }
pre.src-latex:before { content: 'LaTeX'; }
pre.src-ledger:before { content: 'Ledger'; }
pre.src-lisp:before { content: 'Lisp'; }
pre.src-lilypond:before { content: 'Lilypond'; }
pre.src-lua:before { content: 'Lua'; }
pre.src-matlab:before { content: 'MATLAB'; }
pre.src-mscgen:before { content: 'Mscgen'; }
pre.src-ocaml:before { content: 'Objective Caml'; }
pre.src-octave:before { content: 'Octave'; }
pre.src-org:before { content: 'Org mode'; }
pre.src-oz:before { content: 'OZ'; }
pre.src-plantuml:before { content: 'Plantuml'; }
pre.src-processing:before { content: 'Processing.js'; }
pre.src-python:before { content: 'Python'; }
pre.src-R:before { content: 'R'; }
pre.src-ruby:before { content: 'Ruby'; }
pre.src-sass:before { content: 'Sass'; }
pre.src-scheme:before { content: 'Scheme'; }
pre.src-screen:before { content: 'Gnu Screen'; }
pre.src-sed:before { content: 'Sed'; }
pre.src-sh:before { content: 'shell'; }
pre.src-sql:before { content: 'SQL'; }
pre.src-sqlite:before { content: 'SQLite'; }
/* additional languages in org.el's org-babel-load-languages alist */
pre.src-forth:before { content: 'Forth'; }
pre.src-io:before { content: 'IO'; }
pre.src-J:before { content: 'J'; }
pre.src-makefile:before { content: 'Makefile'; }
pre.src-maxima:before { content: 'Maxima'; }
pre.src-perl:before { content: 'Perl'; }
pre.src-picolisp:before { content: 'Pico Lisp'; }
pre.src-scala:before { content: 'Scala'; }
pre.src-shell:before { content: 'Shell Script'; }
pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
/* additional language identifiers per "defun org-babel-execute"
in ob-*.el */
pre.src-cpp:before { content: 'C++'; }
pre.src-abc:before { content: 'ABC'; }
pre.src-coq:before { content: 'Coq'; }
pre.src-groovy:before { content: 'Groovy'; }
/* additional language identifiers from org-babel-shell-names in
ob-shell.el: ob-shell is the only babel language using a lambda to put
the execution function name together. */
pre.src-bash:before { content: 'bash'; }
pre.src-csh:before { content: 'csh'; }
pre.src-ash:before { content: 'ash'; }
pre.src-dash:before { content: 'dash'; }
pre.src-ksh:before { content: 'ksh'; }
pre.src-mksh:before { content: 'mksh'; }
pre.src-posh:before { content: 'posh'; }
/* Additional Emacs modes also supported by the LaTeX listings package */
pre.src-ada:before { content: 'Ada'; }
pre.src-asm:before { content: 'Assembler'; }
pre.src-caml:before { content: 'Caml'; }
pre.src-delphi:before { content: 'Delphi'; }
pre.src-html:before { content: 'HTML'; }
pre.src-idl:before { content: 'IDL'; }
pre.src-mercury:before { content: 'Mercury'; }
pre.src-metapost:before { content: 'MetaPost'; }
pre.src-modula-2:before { content: 'Modula-2'; }
pre.src-pascal:before { content: 'Pascal'; }
pre.src-ps:before { content: 'PostScript'; }
pre.src-prolog:before { content: 'Prolog'; }
pre.src-simula:before { content: 'Simula'; }
pre.src-tcl:before { content: 'tcl'; }
pre.src-tex:before { content: 'TeX'; }
pre.src-plain-tex:before { content: 'Plain TeX'; }
pre.src-verilog:before { content: 'Verilog'; }
pre.src-vhdl:before { content: 'VHDL'; }
pre.src-xml:before { content: 'XML'; }
pre.src-nxml:before { content: 'XML'; }
/* add a generic configuration mode; LaTeX export needs an additional
(add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
pre.src-conf:before { content: 'Configuration File'; }
table { border-collapse:collapse; }
caption.t-above { caption-side: top; }
caption.t-bottom { caption-side: bottom; }
td, th { vertical-align:top; }
th.org-right { text-align: center; }
th.org-left { text-align: center; }
th.org-center { text-align: center; }
td.org-right { text-align: right; }
td.org-left { text-align: left; }
td.org-center { text-align: center; }
dt { font-weight: bold; }
.footpara { display: inline; }
.footdef { margin-bottom: 1em; }
.figure { padding: 1em; }
.figure p { text-align: center; }
.inlinetask {
padding: 10px;
border: 2px solid gray;
margin: 10px;
background: #ffffcc;
}
#org-div-home-and-up
{ text-align: right; font-size: 70%; white-space: nowrap; }
textarea { overflow-x: auto; }
.linenr { font-size: smaller }
.code-highlighted { background-color: #ffff00; }
.org-info-js_info-navigation { border-style: none; }
#org-info-js_console-label
{ font-size: 10px; font-weight: bold; white-space: nowrap; }
.org-info-js_search-highlight
{ background-color: #ffff00; color: #000000; font-weight: bold; }
.org-svg { width: 90%; }
/*]]>*/-->
</style>
<link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/htmlize.css"/>
<link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/readtheorg.css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
<script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
<script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
<script type="text/javascript">
/*
@licstart The following is the entire license notice for the
JavaScript code in this tag.
Copyright (C) 2012-2018 Free Software Foundation, Inc.
The JavaScript code in this tag is free software: you can
redistribute it and/or modify it under the terms of the GNU
General Public License (GNU GPL) as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version. The code is distributed WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
As additional permission under GNU GPL version 3 section 7, you
may distribute non-source (e.g., minimized or compacted) forms of
that code without the copy of the GNU GPL normally required by
section 4, provided you include this license notice and a URL
through which recipients can access the Corresponding Source.
@licend The above is the entire license notice
for the JavaScript code in this tag.
*/
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.cacheClassElem = elem.className;
elem.cacheClassTarget = target.className;
target.className = "code-highlighted";
elem.className = "code-highlighted";
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(elem.cacheClassElem)
elem.className = elem.cacheClassElem;
if(elem.cacheClassTarget)
target.className = elem.cacheClassTarget;
}
/*]]>*///-->
</script>
<script type="text/javascript">
function rpl(expr,a,b) {
var i=0
while (i!=-1) {
i=expr.indexOf(a,i);
if (i>=0) {
expr=expr.substring(0,i)+b+expr.substring(i+a.length);
i+=b.length;
}
}
return expr
}
function show_org_source(){
document.location.href = rpl(document.location.href,".php",".org");
}
</script>
</head>
<body>
<div id="content">
<h1 class="title">Analyse des mots-clés de mon journal</h1>
<div id="table-of-contents">
<h2>Table des matières</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#orge93496f">1. Mise en forme des données</a></li>
<li><a href="#org91643e3">2. Statistiques de base</a></li>
<li><a href="#org1db0ff8">3. Représentations graphiques</a></li>
</ul>
</div>
</div>
<p>
J'ai la chance de ne pas avoir de comptes à rendre trop précis sur le
temps que je passe à faire telle ou telle chose. Ça tombe bien car je
n'aime pas vraiment suivre précisément et quotidiennement le temps que
je passe à faire telle ou telle chose. Par contre, comme vous avez pu
le voir dans une des vidéos de ce module, je note beaucoup
d'informations dans mon journal et j'étiquette (quand j'y pense) ces
informations. Je me suis dit qu'il pourrait être intéressant de voir
si l'évolution de l'utilisation de ces étiquettes révélait quelque
chose sur mes centres d'intérêts professionnels. Je ne compte pas en
déduire grand chose de significatif sur le plan statistique vu que je
sais que ma rigueur dans l'utilisation de ces étiquettes et leur
sémantique a évolué au fil des années mais bon, on va bien voir ce
qu'on y trouve.
</p>
<div id="outline-container-orge93496f" class="outline-2">
<h2 id="orge93496f"><span class="section-number-2">1</span> Mise en forme des données</h2>
<div class="outline-text-2" id="text-1">
<p>
Mon journal est stocké dans <code>/home/alegrand/org/journal.org</code>. Les
entrées de niveau 1 (une étoile) indiquent l'année, celles de niveau 2
(2 étoiles) le mois, celles de niveau 3 (3 étoiles) la date du jour et
enfin, celles de profondeur plus importantes ce sur quoi j'ai
travaillé ce jour là. Ce sont généralement celles-ci qui sont
étiquetées avec des mots-clés entre ":" à la fin de la ligne.
</p>
<p>
Je vais donc chercher à extraire les lignes comportant trois <code>*</code> en
début de ligne et celles commençant par une <code>*</code> et terminant par des
mots-clés (des <code>:</code> suivis éventuellement d'un espace). L'expression
régulière n'est pas forcément parfaite mais ça me donne une première
idée de ce que j'aurai besoin de faire en terme de remise en forme.
</p>
<div class="org-src-container">
<pre class="src src-shell">grep -e <span class="org-string">'^\*\*\* '</span> -e <span class="org-string">'^\*.*:.*: *$'</span> ~/org/journal.org | tail -n 20
</pre>
</div>
<pre class="example">
*** 2018-06-01 vendredi
**** CP Inria du 01/06/18 :POLARIS:INRIA:
*** 2018-06-04 lundi
*** 2018-06-07 jeudi
**** The Cognitive Packet Network - Reinforcement based Network Routing with Random Neural Networks (Erol Gelenbe) :Seminar:
*** 2018-06-08 vendredi
**** The credibility revolution in psychological science: the view from an editor's desk (Simine Vazire, UC DAVIS) :Seminar:
*** 2018-06-11 lundi
**** LIG leaders du 11 juin 2018 :POLARIS:LIG:
*** 2018-06-12 mardi
**** geom_ribbon with discrete x scale :R:
*** 2018-06-13 mercredi
*** 2018-06-14 jeudi
*** 2018-06-20 mercredi
*** 2018-06-21 jeudi
*** 2018-06-22 vendredi
**** Discussion Nicolas Benoit (TGCC, Bruyère) :SG:WP4:
*** 2018-06-25 lundi
*** 2018-06-26 mardi
**** Point budget/contrats POLARIS :POLARIS:INRIA:
</pre>
<p>
OK, je suis sur la bonne voie. Je vois qu'il y a pas mal d'entrées
sans annotation. Tant pis. Il y a aussi souvent plusieurs mots-clés
pour une même date et pour pouvoir bien rajouter la date du jour en
face de chaque mot-clé, je vais essayer un vrai langage plutôt que
d'essayer de faire ça à coup de commandes shell. Je suis de l'ancienne
génération donc j'ai plus l'habitude de Perl que de Python pour ce
genre de choses. Curieusement, ça s'écrit bien plus facilement (ça m'a
pris 5 minutes) que ça ne se relit&#x2026; &#9786;
</p>
<div class="org-src-container">
<pre class="src src-perl">open INPUT, <span class="org-string">"/home/alegrand/org/journal.org"</span> or <span class="org-keyword">die</span> $<span class="org-variable-name">_</span>;
open OUTPUT, <span class="org-string">"&gt; ./org_keywords.csv"</span> or <span class="org-keyword">die</span>;
$<span class="org-variable-name">date</span>=<span class="org-string">""</span>;
print OUTPUT <span class="org-string">"Date,Keyword\n"</span>;
%<span class="org-underline"><span class="org-variable-name">skip</span></span> = <span class="org-type">my</span> %<span class="org-underline"><span class="org-variable-name">params</span></span> = map { $<span class="org-variable-name">_</span> =&gt; 1 } (<span class="org-string">""</span>, <span class="org-string">"ATTACH"</span>, <span class="org-string">"Alvin"</span>, <span class="org-string">"Fred"</span>, <span class="org-string">"Mt"</span>, <span class="org-string">"Henri"</span>, <span class="org-string">"HenriRaf"</span>);
<span class="org-keyword">while</span>(defined($<span class="org-variable-name">line</span>=&lt;<span class="org-constant">INPUT</span>&gt;)) {
chomp($<span class="org-variable-name">line</span>);
<span class="org-keyword">if</span>($<span class="org-variable-name">line</span> =~ <span class="org-string">'^\*\*\* (20[\d\-]*)'</span>) {
$<span class="org-variable-name">date</span>=$<span class="org-variable-name">1</span>;
}
<span class="org-keyword">if</span>($<span class="org-variable-name">line</span> =~ <span class="org-string">'^\*.*(:\w*:)\s*$'</span>) {
@<span class="org-underline"><span class="org-variable-name">kw</span></span>=split(<span class="org-string">/:/</span>,$<span class="org-variable-name">1</span>);
<span class="org-keyword">if</span>($<span class="org-variable-name">date</span> eq <span class="org-string">""</span>) { <span class="org-keyword">next</span>;}
<span class="org-keyword">foreach</span> $<span class="org-variable-name">k</span> (@<span class="org-underline"><span class="org-variable-name">kw</span></span>) {
<span class="org-keyword">if</span>(exists($<span class="org-variable-name">skip</span>{$<span class="org-variable-name">k</span>})) { <span class="org-keyword">next</span>;}
print OUTPUT <span class="org-string">"$date,$k\n"</span>;
}
}
}
</pre>
</div>
<p>
Vérifions à quoi ressemble le résultat :
</p>
<div class="org-src-container">
<pre class="src src-shell">head org_keywords.csv
<span class="org-builtin">echo</span> <span class="org-string">"..."</span>
tail org_keywords.csv
</pre>
</div>
<pre class="example">
Date,Keyword
2011-02-08,R
2011-02-08,Blog
2011-02-08,WP8
2011-02-08,WP8
2011-02-08,WP8
2011-02-17,WP0
2011-02-23,WP0
2011-04-05,Workload
2011-05-17,Workload
...
2018-05-17,POLARIS
2018-05-30,INRIA
2018-05-31,LIG
2018-06-01,INRIA
2018-06-07,Seminar
2018-06-08,Seminar
2018-06-11,LIG
2018-06-12,R
2018-06-22,WP4
2018-06-26,INRIA
</pre>
<p>
C'est parfait !
</p>
</div>
</div>
<div id="outline-container-org91643e3" class="outline-2">
<h2 id="org91643e3"><span class="section-number-2">2</span> Statistiques de base</h2>
<div class="outline-text-2" id="text-2">
<p>
Je suis bien plus à l'aise avec R qu'avec Python. J'utiliserai les
package du tidyverse dès que le besoin s'en fera sentir. Commençons
par lire ces données :
</p>
<div class="org-src-container">
<pre class="src src-R"><span class="org-constant">library</span>(lubridate) <span class="org-comment-delimiter"># </span><span class="org-comment">&#224; installer via install.package("tidyverse")</span>
<span class="org-constant">library</span>(dplyr)
df=read.csv(<span class="org-string">"./org_keywords.csv"</span>,header=T)
df$Year=year(date(df$Date))
</pre>
</div>
<pre class="example">
Attachement du package : ‘lubridate’
The following object is masked from ‘package:base’:
date
Attachement du package : ‘dplyr’
The following objects are masked from ‘package:lubridate’:
intersect, setdiff, union
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
</pre>
<p>
Alors, à quoi ressemblent ces données :
</p>
<div class="org-src-container">
<pre class="src src-R">str(df)
summary(df)
</pre>
</div>
<pre class="example">
'data.frame': 566 obs. of 3 variables:
$ Date : Factor w/ 420 levels "2011-02-08","2011-02-17",..: 1 1 1 1 1 2 3 4 5 6 ...
$ Keyword: Factor w/ 36 levels "Argonne","autotuning",..: 22 3 36 36 36 30 30 29 29 36 ...
$ Year : num 2011 2011 2011 2011 2011 ...
Date Keyword Year
2011-02-08: 5 WP4 : 77 Min. :2011
2016-01-06: 5 POLARIS : 56 1st Qu.:2013
2016-03-29: 5 R : 48 Median :2016
2017-12-11: 5 LIG : 40 Mean :2015
2017-12-12: 5 Teaching: 38 3rd Qu.:2017
2016-01-26: 4 WP7 : 36 Max. :2018
(Other) :537 (Other) :271
</pre>
<p>
Les types ont l'air corrects, 568 entrées, tout va bien.
</p>
<div class="org-src-container">
<pre class="src src-R">df <span class="org-ess-XopX">%&gt;%</span> group_by(Keyword, Year) <span class="org-ess-XopX">%&gt;%</span> summarize(Count=n()) <span class="org-ess-XopX">%&gt;%</span>
ungroup() <span class="org-ess-XopX">%&gt;%</span> arrange(Keyword,Year) <span class="org-constant">-&gt;</span> df_summarized
df_summarized
</pre>
</div>
<pre class="example">
# A tibble: 120 x 3
Keyword Year Count
&lt;fct&gt; &lt;dbl&gt; &lt;int&gt;
1 Argonne 2012 4
2 Argonne 2013 6
3 Argonne 2014 4
4 Argonne 2015 1
5 autotuning 2012 2
6 autotuning 2014 1
7 autotuning 2016 4
8 Blog 2011 2
9 Blog 2012 6
10 Blog 2013 4
# ... with 110 more rows
</pre>
<p>
Commençons par compter combien d'annotations je fais par an.
</p>
<div class="org-src-container">
<pre class="src src-R">df_summarized_total_year = df_summarized <span class="org-ess-XopX">%&gt;%</span> group_by(Year) <span class="org-ess-XopX">%&gt;%</span> summarize(Cout=sum(Count))
df_summarized_total_year
</pre>
</div>
<pre class="example">
# A tibble: 8 x 2
Year Cout
&lt;dbl&gt; &lt;int&gt;
1 2011 24
2 2012 57
3 2013 68
4 2014 21
5 2015 80
6 2016 133
7 2017 135
8 2018 48
</pre>
<p>
Ah, visiblement, je m'améliore au fil du temps et en 2014, j'ai oublié
de le faire régulièrement.
</p>
<p>
L'annotation étant libre, certains mots-clés sont peut-être très peu
présents. Regardons ça.
</p>
<div class="org-src-container">
<pre class="src src-R">df_summarized <span class="org-ess-XopX">%&gt;%</span> group_by(Keyword) <span class="org-ess-XopX">%&gt;%</span> summarize(Count=sum(Count)) <span class="org-ess-XopX">%&gt;%</span> arrange(Count) <span class="org-ess-XopX">%&gt;%</span> as.data.frame()
</pre>
</div>
<pre class="example">
Keyword Count
1 Gradient 1
2 LaTeX 1
3 Orange 1
4 PF 1
5 twitter 2
6 WP1 2
7 WP6 2
8 Epistemology 3
9 BULL 4
10 Vulgarization 4
11 Workload 4
12 GameTheory 5
13 noexport 5
14 autotuning 7
15 Python 7
16 Stats 7
17 WP0 7
18 SG 8
19 git 9
20 HACSPECIS 10
21 Blog 12
22 BOINC 12
23 HOME 12
24 WP3 12
25 OrgMode 14
26 Argonne 15
27 Europe 18
28 Seminar 28
29 WP8 28
30 INRIA 30
31 WP7 36
32 Teaching 38
33 LIG 40
34 R 48
35 POLARIS 56
36 WP4 77
</pre>
<p>
OK, par la suite, je me restraindrai probablement à ceux qui
apparaissent au moins trois fois.
</p>
</div>
</div>
<div id="outline-container-org1db0ff8" class="outline-2">
<h2 id="org1db0ff8"><span class="section-number-2">3</span> Représentations graphiques</h2>
<div class="outline-text-2" id="text-3">
<p>
Pour bien faire, il faudrait que je mette une sémantique et une
hiérarchie sur ces mots-clés mais je manque de temps là. Comme
j'enlève les mots-clés peu fréquents, je vais quand même aussi
rajouter le nombre total de mots-clés pour avoir une idée de ce que
j'ai perdu. Tentons une première représentation graphique :
</p>
<div class="org-src-container">
<pre class="src src-R"><span class="org-constant">library</span>(ggplot2)
df_summarized <span class="org-ess-XopX">%&gt;%</span> filter(Count &gt; 3) <span class="org-ess-XopX">%&gt;%</span>
ggplot(aes(x=Year, y=Count)) +
geom_bar(aes(fill=Keyword),stat=<span class="org-string">"identity"</span>) +
geom_point(data=df_summarized <span class="org-ess-XopX">%&gt;%</span> group_by(Year) <span class="org-ess-XopX">%&gt;%</span> summarize(Count=sum(Count))) +
theme_bw()
</pre>
</div>
<div class="figure">
<p><img src="barchart1.png" alt="barchart1.png" />
</p>
</div>
<p>
Aouch. C'est illisible avec une telle palette de couleurs mais vu
qu'il y a beaucoup de valeurs différentes, difficile d'utiliser une
palette plus discriminante. Je vais quand même essayer rapidement
histoire de dire&#x2026; Pour ça, j'utiliserai une palette de couleur
("Set1") où les couleurs sont toutes bien différentes mais elle n'a
que 9 couleurs. Je vais donc commencer par sélectionner les 9
mots-clés les plus fréquents.
</p>
<div class="org-src-container">
<pre class="src src-R"><span class="org-constant">library</span>(ggplot2)
frequent_keywords = df_summarized <span class="org-ess-XopX">%&gt;%</span> group_by(Keyword) <span class="org-ess-XopX">%&gt;%</span>
summarize(Count=sum(Count)) <span class="org-ess-XopX">%&gt;%</span> arrange(Count) <span class="org-ess-XopX">%&gt;%</span>
as.data.frame() <span class="org-ess-XopX">%&gt;%</span> tail(n=9)
df_summarized <span class="org-ess-XopX">%&gt;%</span> filter(Keyword <span class="org-ess-XopX">%in%</span> frequent_keywords$Keyword) <span class="org-ess-XopX">%&gt;%</span>
ggplot(aes(x=Year, y=Count)) +
geom_bar(aes(fill=Keyword),stat=<span class="org-string">"identity"</span>) +
geom_point(data=df_summarized <span class="org-ess-XopX">%&gt;%</span> group_by(Year) <span class="org-ess-XopX">%&gt;%</span> summarize(Count=sum(Count))) +
theme_bw() + scale_fill_brewer(palette=<span class="org-string">"Set1"</span>)
</pre>
</div>
<div class="figure">
<p><img src="barchart2.png" alt="barchart2.png" />
</p>
</div>
<p>
OK. Visiblement, la part liée à l'administration (<code>Inria</code>, <code>LIG</code>, <code>POLARIS</code>)
et à l'enseignement (<code>Teaching</code>) augmente. L'augmentation des parties
sur <code>R</code> est à mes yeux signe d'une amélioration de ma maîtrise de
l'outil. L'augmentation de la partie <code>Seminar</code> ne signifie pas grand
chose car ce n'est que récemment que j'ai commencé à étiqueter
systématiquement les notes que je prenais quand j'assiste à un
exposé. Les étiquettes sur <code>WP</code> ont trait à la terminologie d'un ancien
projet ANR que j'ai continué à utiliser (<code>WP4</code> = prédiction de
performance HPC, <code>WP7</code> = analyse et visualisation, <code>WP8</code> = plans
d'expérience et moteurs d'expérimentation&#x2026;). Le fait que <code>WP4</code>
diminue est plutôt le fait que les informations à ce sujet sont
maintenant plutôt les journaux de mes doctorants qui réalisent
vraiment les choses que je ne fais que superviser.
</p>
<p>
Bon, une analyse de ce genre ne serait pas digne de ce nom sans un
<i>wordcloud</i> (souvent illisible, mais tellement sexy! &#9786;). Pour ça, je
m'inspire librement de ce post :
<a href="http://onertipaday.blogspot.com/2011/07/word-cloud-in-r.html">http://onertipaday.blogspot.com/2011/07/word-cloud-in-r.html</a>
</p>
<div class="org-src-container">
<pre class="src src-R"><span class="org-constant">library</span>(wordcloud) <span class="org-comment-delimiter"># </span><span class="org-comment">&#224; installer via install.package("wordcloud")</span>
<span class="org-constant">library</span>(RColorBrewer)
pal2 <span class="org-constant">&lt;-</span> brewer.pal(8,<span class="org-string">"Dark2"</span>)
df_summarized <span class="org-ess-XopX">%&gt;%</span> group_by(Keyword) <span class="org-ess-XopX">%&gt;%</span> summarize(Count=sum(Count)) <span class="org-constant">-&gt;</span> df_summarized_keyword
wordcloud(df_summarized_keyword$Keyword, df_summarized_keyword$Count,
random.order=<span class="org-type">FALSE</span>, rot.per=.15, colors=pal2, vfont=c(<span class="org-string">"sans serif"</span>,<span class="org-string">"plain"</span>))
</pre>
</div>
<div class="figure">
<p><img src="wordcloud.png" alt="wordcloud.png" />
</p>
</div>
<p>
Bon&#x2026; voilà, c'est "joli" mais sans grand intérêt, tout
particulièrement quand il y a si peu de mots différents.
</p>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="author">Auteur: Arnaud Legrand</p>
<p class="date">Created: 2018-09-05 mer. 07:41</p>
<p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p>
</div>
</body>
</html>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Analyse du risque de défaillance des joints toriques de la navette Challenger</title>
<meta name="generator" content="Org mode" />
<meta name="author" content="Konrad Hinsen, Arnaud Legrand, Christophe Pouzat" />
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
.title { text-align: center;
margin-bottom: .2em; }
.subtitle { text-align: center;
font-size: medium;
font-weight: bold;
margin-top:0; }
.todo { font-family: monospace; color: red; }
.done { font-family: monospace; color: green; }
.priority { font-family: monospace; color: orange; }
.tag { background-color: #eee; font-family: monospace;
padding: 2px; font-size: 80%; font-weight: normal; }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.org-right { margin-left: auto; margin-right: 0px; text-align: right; }
.org-left { margin-left: 0px; margin-right: auto; text-align: left; }
.org-center { margin-left: auto; margin-right: auto; text-align: center; }
.underline { text-decoration: underline; }
#postamble p, #preamble p { font-size: 90%; margin: .2em; }
p.verse { margin-left: 3%; }
pre {
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;
}
pre.src {
position: relative;
overflow: visible;
padding-top: 1.2em;
}
pre.src:before {
display: none;
position: absolute;
background-color: white;
top: -10px;
right: 10px;
padding: 3px;
border: 1px solid black;
}
pre.src:hover:before { display: inline;}
/* Languages per Org manual */
pre.src-asymptote:before { content: 'Asymptote'; }
pre.src-awk:before { content: 'Awk'; }
pre.src-C:before { content: 'C'; }
/* pre.src-C++ doesn't work in CSS */
pre.src-clojure:before { content: 'Clojure'; }
pre.src-css:before { content: 'CSS'; }
pre.src-D:before { content: 'D'; }
pre.src-ditaa:before { content: 'ditaa'; }
pre.src-dot:before { content: 'Graphviz'; }
pre.src-calc:before { content: 'Emacs Calc'; }
pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
pre.src-fortran:before { content: 'Fortran'; }
pre.src-gnuplot:before { content: 'gnuplot'; }
pre.src-haskell:before { content: 'Haskell'; }
pre.src-hledger:before { content: 'hledger'; }
pre.src-java:before { content: 'Java'; }
pre.src-js:before { content: 'Javascript'; }
pre.src-latex:before { content: 'LaTeX'; }
pre.src-ledger:before { content: 'Ledger'; }
pre.src-lisp:before { content: 'Lisp'; }
pre.src-lilypond:before { content: 'Lilypond'; }
pre.src-lua:before { content: 'Lua'; }
pre.src-matlab:before { content: 'MATLAB'; }
pre.src-mscgen:before { content: 'Mscgen'; }
pre.src-ocaml:before { content: 'Objective Caml'; }
pre.src-octave:before { content: 'Octave'; }
pre.src-org:before { content: 'Org mode'; }
pre.src-oz:before { content: 'OZ'; }
pre.src-plantuml:before { content: 'Plantuml'; }
pre.src-processing:before { content: 'Processing.js'; }
pre.src-python:before { content: 'Python'; }
pre.src-R:before { content: 'R'; }
pre.src-ruby:before { content: 'Ruby'; }
pre.src-sass:before { content: 'Sass'; }
pre.src-scheme:before { content: 'Scheme'; }
pre.src-screen:before { content: 'Gnu Screen'; }
pre.src-sed:before { content: 'Sed'; }
pre.src-sh:before { content: 'shell'; }
pre.src-sql:before { content: 'SQL'; }
pre.src-sqlite:before { content: 'SQLite'; }
/* additional languages in org.el's org-babel-load-languages alist */
pre.src-forth:before { content: 'Forth'; }
pre.src-io:before { content: 'IO'; }
pre.src-J:before { content: 'J'; }
pre.src-makefile:before { content: 'Makefile'; }
pre.src-maxima:before { content: 'Maxima'; }
pre.src-perl:before { content: 'Perl'; }
pre.src-picolisp:before { content: 'Pico Lisp'; }
pre.src-scala:before { content: 'Scala'; }
pre.src-shell:before { content: 'Shell Script'; }
pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
/* additional language identifiers per "defun org-babel-execute"
in ob-*.el */
pre.src-cpp:before { content: 'C++'; }
pre.src-abc:before { content: 'ABC'; }
pre.src-coq:before { content: 'Coq'; }
pre.src-groovy:before { content: 'Groovy'; }
/* additional language identifiers from org-babel-shell-names in
ob-shell.el: ob-shell is the only babel language using a lambda to put
the execution function name together. */
pre.src-bash:before { content: 'bash'; }
pre.src-csh:before { content: 'csh'; }
pre.src-ash:before { content: 'ash'; }
pre.src-dash:before { content: 'dash'; }
pre.src-ksh:before { content: 'ksh'; }
pre.src-mksh:before { content: 'mksh'; }
pre.src-posh:before { content: 'posh'; }
/* Additional Emacs modes also supported by the LaTeX listings package */
pre.src-ada:before { content: 'Ada'; }
pre.src-asm:before { content: 'Assembler'; }
pre.src-caml:before { content: 'Caml'; }
pre.src-delphi:before { content: 'Delphi'; }
pre.src-html:before { content: 'HTML'; }
pre.src-idl:before { content: 'IDL'; }
pre.src-mercury:before { content: 'Mercury'; }
pre.src-metapost:before { content: 'MetaPost'; }
pre.src-modula-2:before { content: 'Modula-2'; }
pre.src-pascal:before { content: 'Pascal'; }
pre.src-ps:before { content: 'PostScript'; }
pre.src-prolog:before { content: 'Prolog'; }
pre.src-simula:before { content: 'Simula'; }
pre.src-tcl:before { content: 'tcl'; }
pre.src-tex:before { content: 'TeX'; }
pre.src-plain-tex:before { content: 'Plain TeX'; }
pre.src-verilog:before { content: 'Verilog'; }
pre.src-vhdl:before { content: 'VHDL'; }
pre.src-xml:before { content: 'XML'; }
pre.src-nxml:before { content: 'XML'; }
/* add a generic configuration mode; LaTeX export needs an additional
(add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
pre.src-conf:before { content: 'Configuration File'; }
table { border-collapse:collapse; }
caption.t-above { caption-side: top; }
caption.t-bottom { caption-side: bottom; }
td, th { vertical-align:top; }
th.org-right { text-align: center; }
th.org-left { text-align: center; }
th.org-center { text-align: center; }
td.org-right { text-align: right; }
td.org-left { text-align: left; }
td.org-center { text-align: center; }
dt { font-weight: bold; }
.footpara { display: inline; }
.footdef { margin-bottom: 1em; }
.figure { padding: 1em; }
.figure p { text-align: center; }
.inlinetask {
padding: 10px;
border: 2px solid gray;
margin: 10px;
background: #ffffcc;
}
#org-div-home-and-up
{ text-align: right; font-size: 70%; white-space: nowrap; }
textarea { overflow-x: auto; }
.linenr { font-size: smaller }
.code-highlighted { background-color: #ffff00; }
.org-info-js_info-navigation { border-style: none; }
#org-info-js_console-label
{ font-size: 10px; font-weight: bold; white-space: nowrap; }
.org-info-js_search-highlight
{ background-color: #ffff00; color: #000000; font-weight: bold; }
.org-svg { width: 90%; }
/*]]>*/-->
</style>
<link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/htmlize.css"/>
<link rel="stylesheet" type="text/css" href="http://www.pirilampo.org/styles/readtheorg/css/readtheorg.css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
<script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
<script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
<script type="text/javascript">
/*
@licstart The following is the entire license notice for the
JavaScript code in this tag.
Copyright (C) 2012-2018 Free Software Foundation, Inc.
The JavaScript code in this tag is free software: you can
redistribute it and/or modify it under the terms of the GNU
General Public License (GNU GPL) as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version. The code is distributed WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
As additional permission under GNU GPL version 3 section 7, you
may distribute non-source (e.g., minimized or compacted) forms of
that code without the copy of the GNU GPL normally required by
section 4, provided you include this license notice and a URL
through which recipients can access the Corresponding Source.
@licend The above is the entire license notice
for the JavaScript code in this tag.
*/
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.cacheClassElem = elem.className;
elem.cacheClassTarget = target.className;
target.className = "code-highlighted";
elem.className = "code-highlighted";
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(elem.cacheClassElem)
elem.className = elem.cacheClassElem;
if(elem.cacheClassTarget)
target.className = elem.cacheClassTarget;
}
/*]]>*///-->
</script>
<script type="text/javascript">
function rpl(expr,a,b) {
var i=0
while (i!=-1) {
i=expr.indexOf(a,i);
if (i>=0) {
expr=expr.substring(0,i)+b+expr.substring(i+a.length);
i+=b.length;
}
}
return expr
}
function show_org_source(){
document.location.href = rpl(document.location.href,".php",".org");
}
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
displayAlign: "center",
displayIndent: "0em",
"HTML-CSS": { scale: 100,
linebreaks: { automatic: "false" },
webFont: "TeX"
},
SVG: {scale: 100,
linebreaks: { automatic: "false" },
font: "TeX"},
NativeMML: {scale: 100},
TeX: { equationNumbers: {autoNumber: "AMS"},
MultLineWidth: "85%",
TagSide: "right",
TagIndent: ".8em"
}
});
</script>
<script type="text/javascript"
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML"></script>
</head>
<body>
<div id="content">
<h1 class="title">Analyse du risque de défaillance des joints toriques de la navette Challenger</h1>
<p>
<b>Préambule :</b> Les explications données dans ce document sur le contexte
de l'étude sont largement reprises de l'excellent livre d'Edward
R. Tufte intitulé <i>Visual Explanations: Images and Quantities, Evidence
and Narrative</i>, publié en 1997 par <i>Graphics Press</i> et réédité en 2005,
ainsi que de l'article de Dalal et al. intitulé <i>Risk Analysis of the
Space Shuttle: Pre-Challenger Prediction of Failure</i> et publié en 1989
dans <i>Journal of the American Statistical Association</i>.
</p>
<div id="outline-container-orgc84b2f3" class="outline-2">
<h2 id="orgc84b2f3"><span class="section-number-2">1</span> Contexte</h2>
<div class="outline-text-2" id="text-1">
<p>
Dans cette étude, nous vous proposons de revenir sur <a href="https://fr.wikipedia.org/wiki/Accident_de_la_navette_spatiale_Challenger">l'accident de la
navette spatiale Challenger</a>. Le 28 Janvier 1986, 73 secondes après son
lancement, la navette Challenger se désintègre (voir Figure <a href="#org22bba0b">1</a>)
et entraîne avec elle, les sept astronautes à son bord. Cette
explosion est due à la défaillance des deux joints toriques
assurant l'étanchéité entre les parties hautes et basses des
propulseurs (voir Figure <a href="#orgc01ded2">2</a>). Ces joints ont perdu de leur
efficacité en raison du froid particulier qui régnait au moment du
lancement. En effet, la température ce matin là était juste en dessous
de 0°C alors que l'ensemble des vols précédents avaient été effectués
à une température d'au moins 7 à 10°C de plus.
</p>
<div id="org22bba0b" class="figure">
<p><img src="challenger5.jpg" alt="challenger5.jpg" />
</p>
<p><span class="figure-number">Figure&nbsp;1&nbsp;: </span>Photos de la catastrophe de Challenger.</p>
</div>
<div id="orgc01ded2" class="figure">
<p><img src="o-ring.png" alt="o-ring.png" />
</p>
<p><span class="figure-number">Figure&nbsp;2&nbsp;: </span>Schéma des propulseurs de la navette challenger. Les joints toriques (un joint principale et un joint secondaire) en caoutchouc de plus de 11 mètres de circonférence assurent l'étanchéité entre la partie haute et la partie basse du propulseur.</p>
</div>
<p>
Le plus étonnant est que la cause précise de cet accident avait été
débattue intensément plusieurs jours auparavant et était encore
discutée la veille même du décollage, pendant trois heures de
télé-conférence entre les ingénieurs de la Morton Thiokol
(constructeur des moteurs) et de la NASA. Si la cause immédiate de
l'accident (la défaillance des joints toriques) a rapidement été
identifiée, les raisons plus profondes qui ont conduit à ce désastre
servent régulièrement de cas d'étude, que ce soit dans des cours de
management (organisation du travail, décision technique malgré des
pressions politiques, problèmes de communication), de statistiques
(évaluation du risque, modélisation, visualisation de données), ou de
sociologie (symptôme d'un historique, de la bureaucratie et du
conformisme à des normes organisationnelles).
</p>
<p>
Dans l'étude que nous vous proposons, nous nous intéressons
principalement à l'aspect statistique mais ce n'est donc qu'une
facette (extrêmement limitée) du problème et nous vous invitons à lire
par vous même les documents donnés en référence dans le
préambule. L'étude qui suit reprend donc une partie des analyses
effectuées cette nuit là et dont l'objectif était d'évaluer
l'influence potentielle de la température et de la pression à laquelle
sont soumis les joints toriques sur leur probabilité de
dysfonctionnement. Pour cela, nous disposons des résultats des
expériences réalisées par les ingénieurs de la NASA durant les 6
années précédant le lancement de la navette Challenger.
</p>
<p>
Dans le répertoire <code>module2/exo5/</code> de votre espace <code>gitlab</code>, vous
trouverez les données d'origine ainsi qu'une analyse pour chacun des
différents parcours proposés. Cette analyse comporte quatre étapes :
</p>
<ol class="org-ol">
<li>Chargement des données</li>
<li>Inspection graphique des données</li>
<li>Estimation de l'influence de la température</li>
<li>Estimation de la probabilité de dysfonctionnement des joints
toriques</li>
</ol>
<p>
Les deux premières étapes ne supposent que des compétences de base en
R ou en Python. La troisième étape suppose une familiarité avec la
régression logistique (généralement abordée en L3 ou M1 de stats,
économétrie, bio-statistique&#x2026;) et la quatrième étape des bases de
probabilités (niveau lycée). Nous vous présentons donc dans la
prochaine section une introduction à la régression logistique qui ne
s'attarde pas sur les détails du calcul, mais juste sur le sens donné
aux résultats de cette régression.
</p>
</div>
</div>
<div id="outline-container-org2b7fcc8" class="outline-2">
<h2 id="org2b7fcc8"><span class="section-number-2">2</span> Introduction à la régression logistique</h2>
<div class="outline-text-2" id="text-2">
<p>
Imaginons que l'on dispose des données suivantes qui indiquent pour
une cohorte d'individus s'ils ont déclaré une maladie particulière ou
pas. Je montre ici l'analyse avec R mais le code Python n'est pas forcément
très éloigné. Les données sont stockées dans une data frame dont voici
un bref résumé :
</p>
<div class="org-src-container">
<pre class="src src-R">summary(df)
str(df)
</pre>
</div>
<pre class="example">
Age Malade
Min. :22.01 Min. :0.000
1st Qu.:35.85 1st Qu.:0.000
Median :50.37 Median :1.000
Mean :50.83 Mean :0.515
3rd Qu.:65.37 3rd Qu.:1.000
Max. :79.80 Max. :1.000
'data.frame': 400 obs. of 2 variables:
$ Age : num 75.1 76.4 38.6 70.2 59.2 ...
$ Malade: int 1 1 0 1 1 1 0 0 1 1 ...
</pre>
<p>
Voici une représentation graphique des données qui permet de mieux
percevoir le lien qu'il peut y avoir entre l'âge et le fait de
contracter cette maladie ou pas :
</p>
<div class="org-src-container">
<pre class="src src-R">ggplot(df,aes(x=Age,y=Malade)) + geom_point(alpha=.3,size=3) + theme_bw()
</pre>
</div>
<div class="figure">
<p><object type="image/svg+xml" data="fig1.svg" class="org-svg">
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
Il apparaît clairement sur ces données que plus l'on est âgé, plus la
probabilité de développer cette maladie est importante. Mais comment
estimer cette probabilité à partir uniquement de ces valeurs binaires
(malade/pas malade) ? Pour chaque tranche d'âge (par exemple de 5 ans),
on pourrait regarder la fréquence de la maladie (le code qui suit est
un peu compliqué car le calcul de l'intervalle de confiance pour ce
type de données nécessite un traitement particulier via la fonction
<code>binconf</code>).
</p>
<div class="org-src-container">
<pre class="src src-R">age_range=5
df_grouped = df <span class="org-ess-XopX">%&gt;%</span> mutate(Age=age_range*(floor(Age/age_range)+.5)) <span class="org-ess-XopX">%&gt;%</span>
group_by(Age) <span class="org-ess-XopX">%&gt;%</span> summarise(Malade=sum(Malade),N=n()) <span class="org-ess-XopX">%&gt;%</span>
rowwise() <span class="org-ess-XopX">%&gt;%</span>
do(data.frame(Age=.$Age, binconf(.$Malade, .$N, alpha=0.05))) <span class="org-ess-XopX">%&gt;%</span>
as.data.frame()
ggplot(df_grouped,aes(x=Age)) + geom_point(data=df,aes(y=Malade),alpha=.3,size=3) +
geom_errorbar(data=df_grouped,
aes(x=Age,ymin=Lower, ymax=Upper, y=PointEst), color=<span class="org-string">"darkred"</span>) +
geom_point(data=df_grouped, aes(x=Age, y=PointEst), size=3, shape=21, color=<span class="org-string">"darkred"</span>) +
theme_bw()
</pre>
</div>
<div class="figure">
<p><object type="image/svg+xml" data="fig1bis.svg" class="org-svg">
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
L'inconvénient de cette approche est que ce calcul est effectué
indépendemment pour chaque tranches d'âges, que la tranche d'âge est
arbitraire, et qu'on n'a pas grande idée de la façon dont ça
évolue. Pour modéliser cette évolution de façon plus continue, on
pourrait tenter une régression linéaire (le modèle le plus simple
possible pour rendre compte de l'influence d'un paramètre) et ainsi
estimer l'effet de l'âge sur la probabilité d'être malade :
</p>
<div class="org-src-container">
<pre class="src src-R">ggplot(df,aes(x=Age,y=Malade)) + geom_point(alpha=.3,size=3) +
theme_bw() + geom_smooth(method=<span class="org-string">"lm"</span>)
</pre>
</div>
<div class="figure">
<p><object type="image/svg+xml" data="fig2.svg" class="org-svg">
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
La ligne bleue est la régression linéaire au sens des moindres carrés
et la zone grise est la zone de confiance à 95% de cette
estimation (avec les données dont on dispose et cette hypothèse de
linéarité, la ligne bleue est la plus probable et il y a 95% de chance
que la vraie ligne soit dans cette zone grise).
</p>
<p>
Mais on voit clairement dans cette représentation graphique que cette
estimation n'a aucun sens. Une probabilité doit être comprise entre 0
et 1 et avec une régression linéaire on arrivera forcément pour des
valeurs un peu extrêmes (jeune ou âgé) à des prédictions aberrantes
(négative ou supérieures à 1). C'est tout simplement dû au fait qu'une
régression linéaire fait l'hypothèse que \(\textsf{Malade} =
\alpha.\textsf{Age} + \beta + \epsilon\), où \(\alpha\) et \(\beta\) sont des nombres réels et \(\epsilon\)
est un bruit (une variable aléatoire de moyenne nulle), et estime \(\alpha\)
et \(\beta\) à partir des données.
</p>
<p>
Cette technique n'a pas de sens pour estimer une probabilité et il
convient donc d'utiliser ce que l'on appelle une <a href="https://fr.wikipedia.org/wiki/R%C3%A9gression_logistique">régression
logistique</a> :
</p>
<div class="org-src-container">
<pre class="src src-R">ggplot(df,aes(x=Age,y=Malade)) + geom_point(alpha=.3,size=3) +
theme_bw() +
geom_smooth(method = <span class="org-string">"glm"</span>,
method.args = list(family = <span class="org-string">"binomial"</span>)) + xlim(20,80)
</pre>
</div>
<div class="figure">
<p><object type="image/svg+xml" data="fig3.svg" class="org-svg">
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
Ici, la bibliothèque <code>ggplot</code> fait tous les calculs de régression
logistique pour nous et nous montre uniquement le résultat "graphique"
mais dans l'analyse que nous vous proposerons pour Challenger, nous
réalisons la régression et la prédiction à la main (en <code>R</code> ou en <code>Python</code>
selon le parcours que vous choisirez) de façon à pouvoir effectuer si
besoin une inspection plus fine. Comme avant, la courbe bleue indique
l'estimation de la probabilité d'être malade en fonction de l'âge et
la zone grise nous donne des indications sur l'incertitude de cette
estimation, i.e., "sous ces hypothèses et étant donné le peu de
données qu'on a et leur variabilité, il y a 95% de chances pour que la
vraie courbe se trouve quelque part (n'importe où) dans la zone
grise".
</p>
<p>
Dans ce modèle, on suppose que \(P[\textsf{Malade}] = \pi(\textsf{Age})\) avec
\(\displaystyle\pi(x)=\frac{e^{\alpha.x + \beta}}{1+e^{\alpha.x + \beta}}\). Cette
formule (étrange au premier abord) a la bonne propriété de nous donner
systématiquement une valeur comprise entre 0 et 1 et de bien tendre
rapidement vers \(0\) quand l'âge tend vers \(-\infty\) et vers \(1\) quand l'âge
tend vers \(+\infty\) (mais ce n'est pas bien sûr pas la seule motivation).
</p>
<p>
En conclusion, lorsque l'on dispose de données évènementielles
(binaires) et que l'on souhaite estimer l'influence d'un paramètre sur
la probabilité d'occurrence de l'évènement (maladie, défaillance&#x2026;),
le modèle le plus naturel et le plus simple est celui de la
régression logistique. Notez, que même en se restreignant à une petite
partie des données (par exemple, uniquement les patients de moins de
50 ans), il est possible d'obtenir une estimation assez raisonnable,
même si, comme on pouvait s'y attendre, l'incertitude augmente
singulièrement.
</p>
<div class="org-src-container">
<pre class="src src-R">ggplot(df[df$Age&lt;50,],aes(x=Age,y=Malade)) + geom_point(alpha=.3,size=3) +
theme_bw() +
geom_smooth(method = <span class="org-string">"glm"</span>,
method.args = list(family = <span class="org-string">"binomial"</span>),fullrange = <span class="org-type">TRUE</span>) + xlim(20,80)
</pre>
</div>
<div class="figure">
<p><object type="image/svg+xml" data="fig4.svg" class="org-svg">
Sorry, your browser does not support SVG.</object>
</p>
</div>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="date">Date: Juin 2018</p>
<p class="author">Auteur: Konrad Hinsen, Arnaud Legrand, Christophe Pouzat</p>
<p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p>
</div>
</body>
</html>
<div id="content">
<h1 class="title">Emacs/org-mode</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org945f839">Installing emacs, org-mode, ess, and auctex.</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orge325516">Linux (Debian, Ubuntu)</a></li>
<li style="margin-bottom:0;"><a href="#org95706b9">macOS</a></li>
<li style="margin-bottom:0;"><a href="#orgb87390f">Windows</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org69f7d5a">Directory naming conventions</a></li>
<li style="margin-bottom:0;"><a href="#orgfe267c7">Making R and Python available to the console</a></li>
<li style="margin-bottom:0;"><a href="#orgc458b5a">Installing and configuring Matplotlib (graphic python library)</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org47ff448">All platforms: pretty code in HTML export</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org98a8b7e">A simple "<i>reproducible research</i>" emacs configuration</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orgd0815e9">Step 0: Backup and download our configuration</a></li>
<li style="margin-bottom:0;"><a href="#org9008837">Step 1: Prepare your journal</a></li>
<li style="margin-bottom:0;"><a href="#orgd3b10f7">Step 2: Set up Emacs configuration</a></li>
<li style="margin-bottom:0;"><a href="#org73ef313">Step 3: Adapt the configuration to your specific needs if required</a></li>
<li style="margin-bottom:0;"><a href="#orgc85363f">Step 4: Check whether the installation is working or not</a></li>
<li style="margin-bottom:0;"><a href="#orgadd0750">Step 5: Open and play with your journal:</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org7b9442b">A stub of a replicable article</a></li>
<li style="margin-bottom:0;"><a href="#orgdde4be6">Emacs tips and tricks</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org152ca8a">Cheat-sheets</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org8a94d36">Emacs</a></li>
<li style="margin-bottom:0;"><a href="#org1765514">Org-mode</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org4cc7124">Video tutorials</a></li>
<li style="margin-bottom:0;"><a href="#orgd5c8443">Additional useful emacs packages</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orgb58b5a8">Company-mode</a></li>
<li style="margin-bottom:0;"><a href="#orgc20d2e9">Magit</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org9260c82">Other resources</a></li>
</ul>
</li>
</ul>
</div>
</div>
<p>
<b>Disclaimer:</b> The two sections <span class="underline">A simple "<i>reproducible research</i>" emacs
configuration</span> and <span class="underline">A stub of replicable article</span> explain how to set up
emacs/org-mode for this MOOC. These are very important sections in the
context of this MOOC. <b>These sections are illustrated in two
out of the <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/9cfc7500f0ef46d288d2317ec7b037b4">three video tutorials of this sequence</a>, and</b> <b>which you
really should follow carefully</b>. <b>Otherwise, you may have trouble doing
the exercises later on</b>. Likewise, I strongly encourage you to watch
the <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/9cfc7500f0ef46d288d2317ec7b037b4">"emacs and git" video tutorial available at the same place</a>.
</p>
<p>
The next section provides information on how to install emacs.
</p>
<div id="outline-container-org945f839" class="outline-2">
<h2 id="org945f839">Installing emacs, org-mode, ess, and auctex.</h2>
<div class="outline-text-2" id="text-org945f839">
</div>
<div id="outline-container-orge325516" class="outline-3">
<h3 id="orge325516">Linux (Debian, Ubuntu)</h3>
<div class="outline-text-3" id="text-orge325516">
<p>
We provide here only instructions for Debian-based distributions. Feel
free to contribute to this document to provide up-to-date information
for other distributions (e.g.n redhat, fedora).
</p>
<p>
Today, the stable versions of the most common distributions provide
recent enough versions of emacs and org-mode:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">Debian (stretch) ships with <a href="https://packages.debian.org/stretch/emacs25">emacs 25.1</a> and <a href="https://packages.debian.org/stretch/org-mode">org-mode 9.0.3</a></li>
<li style="margin-bottom:0;">Ubuntu (bionic 18.04) ships with <a href="https://packages.ubuntu.com/bionic/emacs25">emacs 25.2</a> and <a href="https://packages.ubuntu.com/bionic/org-mode">org-mode 9.1.6</a></li>
<li style="margin-bottom:0;">Ubuntu (artful 17.04) ships with <a href="https://packages.ubuntu.com/artful/emacs25">emacs 25.2</a> and <a href="https://packages.ubuntu.com/artful/org-mode">org-mode 9.0.9</a></li>
</ul>
<p>
If your distribution is older than this, well, it may be a good time
for upgrading&#x2026;
</p>
<p>
Simply run (as root):
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">apt-get update ; apt-get install emacs25 org-mode ess r-base auctex
</pre>
</div>
<p>
Then make sure you have a sufficiently recent version of emacs.
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">emacs --version 2&gt;&amp;1 | head -n 1
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
GNU Emacs 25.2.2
</pre>
<p>
Likewise, you'll want to check you have a recent version of org-mode:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">emacs -batch --funcall <span style="font-style: italic;">"org-version"</span> 2&gt;&amp;1 | grep version
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
Org mode version 9.1.11 (9.1.11-dist @ /usr/share/emacs/25.2/site-lisp/elpa/org-9.1.11/)
</pre>
<p>
The version numbers you get will depend on the distribution you are
running. <span class="underline">You really want to make sure you do not rely on org-mode 8</span>,
which is now deprecated.
</p>
</div>
</div>
<div id="outline-container-org95706b9" class="outline-3">
<h3 id="org95706b9">macOS</h3>
<div class="outline-text-3" id="text-org95706b9">
<p>
<b>Note:</b> macOS comes with a prehistoric command-line-only version of Emacs located at <code>/usr/bin/emacs</code>. It's best to forget about it.
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><p>
<b>Option 1</b>: Install the <code>.dmg</code> file from <a href="http://vgoulet.act.ulaval.ca/">Vincent Goulet</a>:
<a href="https://vigou3.gitlab.io/emacs-modified-macos/">https://vigou3.gitlab.io/emacs-modified-macos/</a>. It ships with recent
versions:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">Emacs 26.1</li>
<li style="margin-bottom:0;">Org-mode 9.1.13</li>
<li style="margin-bottom:0;">ESS 17.11</li>
</ul>
<p>
If you install this version of Emacs, or in fact any other version of
Emacs distributed as a clickable application in a <code>.dmg</code> file,
you must type the full path to the executable if you want to run
Emacs from a terminal. For example, if your clickable application
is at <code>/Applications/Emacs.app</code>, then the executable is at
<code>/Applications/Emacs.app/Contents/MacOS/Emacs</code>
</p></li>
<li style="margin-bottom:0;"><p>
<b>Option 2</b>: If you use <a href="https://docs.brew.sh/">Homebrew</a>, do the following:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">brew update
brew install emacs --with-cocoa
brew linkapps emacs
brew install wget
brew tap dunn/emacs
brew install auctex
brew tap brewsci/science
brew install ess
</pre>
</div>
<p>
This provides an <code>emacs</code> command for use from the command line, plus a clickable application at <code>Cellar/emacs/26.1_1/Emacs.app</code> inside your Homebrew directory. If
you installed Homebrew at the default location <code>/usr/local</code>, then this is <code>/usr/local/Cellar/emacs/26.1_1/Emacs.app</code>.
If you installed Homebrew on an account with administrator privileges, you can add
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">brew linkapps emacs
</pre>
</div>
<p>
in order to make Emacs accessible directly from <code>/Applications</code>.
</p></li>
</ul>
</div>
</div>
<div id="outline-container-orgb87390f" class="outline-3">
<h3 id="orgb87390f">Windows</h3>
<div class="outline-text-3" id="text-orgb87390f">
<p>
Install the <code>.exe</code> file from <a href="http://vgoulet.act.ulaval.ca/">Vincent Goulet</a>:
<a href="https://vigou3.gitlab.io/emacs-modified-windows/">https://vigou3.gitlab.io/emacs-modified-windows/</a>. It ships with recent
versions:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">Emacs 26.1</li>
<li style="margin-bottom:0;">Org-mode 9.1.13</li>
<li style="margin-bottom:0;">ESS 17.11</li>
</ul>
</div>
<div id="outline-container-org69f7d5a" class="outline-4">
<h4 id="org69f7d5a">Directory naming conventions</h4>
<div class="outline-text-4" id="text-org69f7d5a">
<p>
In the following instructions, we refer to your home
directory through the (UNIX) <code>~/</code> notation. On Windows, your home
directory should be something like <code>C:\Users\yourname</code>. Therefore,
whenever we mention the <code>~/org/</code> (resp. the <code>~/.emacs.d/</code>) directory this
means we are referring to <code>C:\Users\yourname\org</code> (resp.
<code>C:\Users\yourname\.emacs.d\</code>).
</p>
</div>
</div>
<div id="outline-container-orgfe267c7" class="outline-4">
<h4 id="orgfe267c7">Making R and Python available to the console</h4>
<div class="outline-text-4" id="text-orgfe267c7">
<p>
When running a command, Windows will look for the command in the
directories indicated in the <code>PATH</code> environment variable. If none of
these directories contains the command, Windows will stop and indicate
the command does not exist. To make sure R (which may be in
something like <code>C:/Program Files/R/R-3.5.1/bin/x64/</code>) and Python (which may be in something like <code>C:/Program Files/Python/Python37/</code>) can
easily be run from Emacs, you should thus configure the <code>PATH</code> variable
accordingly.
</p>
<p>
This requires to go through the "Environment Variable" editor as
explained <a href="http://sametmax.com/ajouter-un-chemin-a-la-variable-denvironnement-path-sous-windows/">here</a>.
</p>
</div>
</div>
<div id="outline-container-orgc458b5a" class="outline-4">
<h4 id="orgc458b5a">Installing and configuring Matplotlib (graphic python library)</h4>
<div class="outline-text-4" id="text-orgc458b5a">
<p>
Open an DOS console and type the following command:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">python -m pip install -U matplotlib
</pre>
</div>
<div class="figure">
<p><img src="" alt="install_matplotlib.png" /></p>
</div>
<p>
Then you will want to deactivate interactive plots in matplotlib. To
this end, you first need to know where the matplotlib configuration is
located. Open a python console the type the following code:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python"><span style="font-weight: bold;">import</span> matplotlib
matplotlib.matplotlib_fname()
</pre>
</div>
<div class="figure">
<p><img src="" alt="matplotlib.png" /></p>
</div>
<p>
Open the <code>matplotlibrc</code> file and add a <code>#</code> at the beginning of the line
starting with <code>backend</code>, which amounts to use the default <code>Agg</code> value.
</p>
</div>
</div>
</div>
<div id="outline-container-org47ff448" class="outline-3">
<h3 id="org47ff448">All platforms: pretty code in HTML export</h3>
<div class="outline-text-3" id="text-org47ff448">
<p>
To have code pretty printing when exporting to HTML, you should
install the <code>htmlize</code> package, which is done by opening emacs and
typing the following command:
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
M-x package-install RET htmlize RET # where M-x means pressing the "Esc" key then the "x" key
</pre>
</div>
</div>
</div>
<div id="outline-container-org98a8b7e" class="outline-2">
<h2 id="org98a8b7e">A simple "<i>reproducible research</i>" emacs configuration</h2>
<div class="outline-text-2" id="text-org98a8b7e">
<p>
This section is illustrated in a <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/9cfc7500f0ef46d288d2317ec7b037b4">video tutorial</a> (<i>"Mise en place
Emacs/Orgmode"</i> in French). Watching it before following the
instructions given in this section may help.
</p>
<p>
Emacs comes with very basic default configuration and it appears like
everyone has its own taste. You will for example find <a href="https://www.emacswiki.org/emacs/StarterKits">here</a> several
default Emacs configurations that reflect the preferences of their
creators. Likewise the configuration of Org-Mode is incredibly
flexible (see for example <a href="https://orgmode.org/worg/org-configs/index.html">the org-mode website</a> for more
references). In the context of this MOOC, we propose you a relatively
minimalist one that is rather "<i>reproducible research</i>" oriented by
adding a few org-mode specific configurations.
</p>
</div>
<div id="outline-container-orgd0815e9" class="outline-3">
<h3 id="orgd0815e9">Step 0: Backup and download our configuration</h3>
<div class="outline-text-3" id="text-orgd0815e9">
<p>
The procedure we propose will wipe your already existing custom Emacs
configuration if you have one. <b>You should thus beforehand make a
backup</b> of <code>~/.emacs</code> and of <code>~/.emacs.d/init.el</code> (if these files exist).
</p>
<p>
Then download <a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/raw/master/module2/ressources/rr_org_archive.tgz">this archive</a> and uncompress it. It contains the
following files and we will refer to them in the following:
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
rr_org/init.el
rr_org/journal.org
</pre>
<p>
Alternatively, <a href="rr_org/">the files you are looking for are available here</a>.
</p>
</div>
</div>
<div id="outline-container-org9008837" class="outline-3">
<h3 id="org9008837">Step 1: Prepare your journal</h3>
<div class="outline-text-3" id="text-org9008837">
<p>
Create an <code>org/</code> directory in the top of your home:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">mkdir -p ~/org/
</pre>
</div>
<p>
Then copy <code>rr_org/journal.org</code> file in your <code>~/org/</code> directory. This
file will be your laboratory notebook and all the notes you will
capture with <code>C-c c</code> will go automatically go in this file. The first
entry of this notebook is populated with <a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/rr_org/journal.org">many Emacs shortcuts</a> that you
should give a try.
</p>
</div>
</div>
<div id="outline-container-orgd3b10f7" class="outline-3">
<h3 id="orgd3b10f7">Step 2: Set up Emacs configuration</h3>
<div class="outline-text-3" id="text-orgd3b10f7">
<p>
Copy <code>rr_org/init.el</code> in your <code>~/.emacs.d/</code> directory.
</p>
<p>
Alternatively, if you do not want to mess with your already existing
emacs configuration, you may launch emacs with this specific
configuration with the following command: <code>emacs -q -l rr_org/init.el</code>.
</p>
</div>
</div>
<div id="outline-container-org73ef313" class="outline-3">
<h3 id="org73ef313">Step 3: Adapt the configuration to your specific needs if required</h3>
<div class="outline-text-3" id="text-org73ef313">
<p>
There are two situations in which it might be necessary to modify
<code>init.el</code>:
</p>
<ol class="org-ol">
<li style="margin-bottom:0;">Your network environment forces you to use a proxy for access
to Web sites (HTTP(S) protocol).</li>
<li style="margin-bottom:0;"><p>
You have multiple installations of Python or R on your computer,
or they are in unusual places and not fully configured.
If you can run
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">"python3" and "R" under Linux and macOS</li>
<li style="margin-bottom:0;">"Python" and "R" under Windows</li>
</ul>
<p>
in a terminal without getting an error message, then you should
not have to do anything.
</p></li>
</ol>
<p>
If you do have to modify <code>init.el</code>, check the comments at the
beginning of the file for instructions.
</p>
</div>
</div>
<div id="outline-container-orgc85363f" class="outline-3">
<h3 id="orgc85363f">Step 4: Check whether the installation is working or not</h3>
<div class="outline-text-3" id="text-orgc85363f">
<p>
Open a new instance of Emacs and open a <code>foo.org</code> file. Copy the
following lines in this file:
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
#+begin_src shell :session foo :results output :exports both
ls -la # or dir under windows
#+end_src
</pre>
<p>
Put your cursor inside this code block and execute it with the
following command: <code>C-c C-c</code> (If you are not familiar with Emacs
commands, this one means '<code>Ctrl + C</code>' twice)
</p>
<p>
A <code>#+RESULTS:</code> block with the result of the command should appear if it
worked.
</p>
<p>
In the video, we already have demonstrated the main features and
shortcuts of emacs/org-mode that will help you maintain a document and
benefit from literate programming. The list of features and shortcuts
is demonstrated in the <a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/rr_org/journal.org">first entry of your labbook</a>.
</p>
</div>
</div>
<div id="outline-container-orgadd0750" class="outline-3">
<h3 id="orgadd0750">Step 5: Open and play with your journal:</h3>
<div class="outline-text-3" id="text-orgadd0750">
<p>
In step 1, you were told to create an journal in
<code>~org/journal.org</code>. First you probably want to make sure this file is
stored in a version control system like git. We leave it up to you
to set this up but if you have any trouble, feel free to ask on the
FUN forums.
</p>
</div>
</div>
</div>
<div id="outline-container-org7b9442b" class="outline-2">
<h2 id="org7b9442b">A stub of a replicable article</h2>
<div class="outline-text-2" id="text-org7b9442b">
<p>
This section is illustrated in a <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/9cfc7500f0ef46d288d2317ec7b037b4">video tutorial</a> (<i>"Écrire un article
réplicable avec Emacs/Orgmode"</i> in French). Watching it before
following the instructions given in this section may help.
</p>
<p>
Remember, you need a working LaTeX and R environment. If you can't
open a terminal and run the commands <code>R</code>, <code>pdflatex</code>, and <code>python</code>, you will not be
able to generate this document. When being compiled, the article downloads the
corresponding LaTeX packages so you also need to have a working <code>wget</code>
command (alternatively, it uses <code>curl</code>). Once downloaded, you may still read the
source (<a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/replicable_article/article.org">article.org</a>) and understand how it works though.
</p>
<p>
Download the following <a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/raw/master/module2/ressources/replicable_article.tgz">archive</a>, uncompress it and simply <code>make</code> to generate the
article. You should then be able to open the <a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/replicable_article/article.pdf">resulting article</a>. This
is summarized in the following command:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">wget --no-check-certificate -O replicable_article.tgz https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/raw/master/module2/ressources/replicable_article.tgz
tar zxf replicable_article.tgz; <span style="font-weight: bold;">cd</span> replicable_article; make ; evince article.pdf
</pre>
</div>
<p>
<b>Possible issues</b>:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><p>
If the <code>make</code> command fails (especially on Mac), it may be because
Emacs or something else is not correctly installed. In that case,
open the article directly with the following command:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">emacs -q --eval <span style="font-style: italic;">"(setq enable-local-eval t)"</span> --eval <span style="font-style: italic;">"(setq enable-local-variables t)"</span> article.org
</pre>
</div>
<p>
and export it to pdf with the following shortcut: <code>C-c C-e l o</code>
</p></li>
<li style="margin-bottom:0;">If it still doesn't work and emacs complains about not finding ESS,
it may be because you installed ESS in your home instead of
system-wide. In that case, try to remove the <code>-q</code> in the previous
command line to load your personal emacs configuration.</li>
</ul>
<p>
Finally, when you'll be tired of always re-executing all the source
code when exporting, just look for the following line in <a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/replicable_article/article.org">article.org</a>:
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
# #+PROPERTY: header-args :eval never-export
</pre>
<p>
If you remove the <code></code> in the beginning of the line, it will not be a
comment anymore and will indicate org-mode to stop evaluating every
chunk of code when exporting.
</p>
</div>
</div>
<div id="outline-container-orgdde4be6" class="outline-2">
<h2 id="orgdde4be6">Emacs tips and tricks</h2>
<div class="outline-text-2" id="text-orgdde4be6">
</div>
<div id="outline-container-org152ca8a" class="outline-3">
<h3 id="org152ca8a">Cheat-sheets</h3>
<div class="outline-text-3" id="text-org152ca8a">
<p>
Learning Emacs and Org-Mode can be difficult as there is an inordinate
amount of shortcuts. Many people have thus come up with
cheat-sheats. Here is a selection in case it helps:
</p>
</div>
<div id="outline-container-org8a94d36" class="outline-4">
<h4 id="org8a94d36">Emacs</h4>
<div class="outline-text-4" id="text-org8a94d36">
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/rr_org/journal.org">Common and step-by-step Emacs shortcuts for our <i>reproducible research</i> configuration</a></li>
<li style="margin-bottom:0;"><a href="https://www.gnu.org/software/emacs/refcards/pdf/refcard.pdf">The official GNU emacs refcard</a></li>
<li style="margin-bottom:0;">Two graphical cheat-sheats by Sacha Chua on <a href="http://sachachua.com/blog/wp-content/uploads/2013/05/How-to-Learn-Emacs-v2-Large.png">how to learn Emacs</a> and on
<a href="http://sachachua.com/blog/wp-content/uploads/2013/08/20130830-Emacs-Newbie-How-to-Learn-Emacs-Keyboard-Shortcuts.png">how to learn Emacs shortcuts</a>.</li>
</ul>
</div>
</div>
<div id="outline-container-org1765514" class="outline-4">
<h4 id="org1765514">Org-mode</h4>
<div class="outline-text-4" id="text-org1765514">
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://app-learninglab.inria.fr/gitlab/learning-lab/mooc-rr-ressources/blob/master/module2/ressources/rr_org/journal.org">Common and step-by-step org-mode shortcuts for our <i>reproducible research</i> configuration</a></li>
<li style="margin-bottom:0;"><a href="https://orgmode.org/worg/orgcard.html">The official org-mode refcard</a></li>
<li style="margin-bottom:0;"><a href="https://orgmode.org/worg/dev/org-syntax.html">The official description of the org-mode syntax</a> and a <a href="https://gist.github.com/hoeltgman/3825415">relatively concise description of the org-mode syntax</a>.</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-org4cc7124" class="outline-3">
<h3 id="org4cc7124">Video tutorials</h3>
<div class="outline-text-3" id="text-org4cc7124">
<p>
For those of you who prefer video explanations, here is a <a href="https://www.youtube.com/playlist?list=PL9KxKa8NpFxIcNQa9js7dQQIHc81b0-Xg">Youtube
channel with many step by step emacs tutorials</a>.
</p>
</div>
</div>
<div id="outline-container-orgd5c8443" class="outline-3">
<h3 id="orgd5c8443">Additional useful emacs packages</h3>
<div class="outline-text-3" id="text-orgd5c8443">
</div>
<div id="outline-container-orgb58b5a8" class="outline-4">
<h4 id="orgb58b5a8">Company-mode</h4>
<div class="outline-text-4" id="text-orgb58b5a8">
<p>
<a href="http://company-mode.github.io/">Company-mode</a> is a text completion framework for Emacs. It allows to
have smart completion in emacs for the most common languages. If you
feel this is needed, you should follow the instructions from the
official Web page: <a href="http://company-mode.github.io/">http://company-mode.github.io/</a>
</p>
</div>
</div>
<div id="outline-container-orgc20d2e9" class="outline-4">
<h4 id="orgc20d2e9">Magit</h4>
<div class="outline-text-4" id="text-orgc20d2e9">
<p>
<a href="https://magit.vc/">Magit</a> is an Emacs interface for Git. Its usage is briefly illustrated
in the context of this MOOC in a <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/9cfc7500f0ef46d288d2317ec7b037b4">video tutorial</a> (<i>"Utilisation
Emacs/git"</i> in French).
</p>
<p>
It is very powerful and we use it on a daily basis but you should
definitely understand what git does behind the scenes beforehand. If
you feel this would be useful for you, you should follow <a href="https://magit.vc/screenshots/">this visual
walk-through</a> or <a href="https://www.emacswiki.org/emacs/Magit">this really short "crash course"</a>. If you installed the
previous "<i>reproducible research</i>" emacs configuration, you can easily
invoke magit by using <code>C-x g</code>.
</p>
</div>
</div>
</div>
<div id="outline-container-org9260c82" class="outline-3">
<h3 id="org9260c82">Other resources</h3>
<div class="outline-text-3" id="text-org9260c82">
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://orgmode.org/orgguide.pdf">The compact Org-mode Guide</a></li>
<li style="margin-bottom:0;"><a href="https://github.com/dfeich/org-babel-examples">Many examples illustrating the use of different languages in org-mode</a></li>
</ul>
</div>
</div>
</div>
</div>
This source diff could not be displayed because it is too large. You can view the blob instead.
<div id="content">
<h1 class="title">Jupyter</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orgd37ebcc">1. Jupyter tips and tricks</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orge426487">Creating or importing a notebook</a></li>
<li style="margin-bottom:0;"><a href="#org6607533">Running R and Python in the same notebook</a></li>
<li style="margin-bottom:0;"><a href="#orgd909163">Other languages</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org9176019">2. Installing and configuring Jupyter on your computer</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org51b3378">2.1 Installing Jupyter</a></li>
<li style="margin-bottom:0;"><a href="#orgc0d2a71">2.2 Making sure Jupyter allows you to use R</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org608c96e">• Installing IRKernel (R package)</a></li>
<li style="margin-bottom:0;"><a href="#org076acd5">• Installing rpy2 (Python package)</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org524a8ac">2.3 Additional tips</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org814c076">• Exporting a notebook</a></li>
<li style="margin-bottom:0;"><a href="#orga5ae744">• Improving notebook readability</a></li>
<li style="margin-bottom:0;"><a href="#orgab0846a">• Interacting with GitLab and GitHub</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-orgd37ebcc" class="outline-2">
<h2 id="orgd37ebcc">1. Jupyter tips and tricks</h2>
<div class="outline-text-2" id="text-orgd37ebcc">
<p>
The following <a href="https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/">webpage</a> lists several Jupyter tricks (in particular, it
illustrates many <code>IPython magic</code> commands) that should improve your
efficiency (note that this blog post is about two years old so some of
the tricks may have been integrated in the default behavior of Jupyter
now).
</p>
</div>
<div id="outline-container-orge426487" class="outline-3">
<h3 id="orge426487">Creating or importing a notebook</h3>
<div class="outline-text-3" id="text-orge426487">
<p>
Using the Jupyter environment we deployed for this MOOC will allow to
easily access any file from your default GitLab project. There are
situations however where you may want to play with other notebooks.
</p>
<dl class="org-dl">
<dt>Adding a brand new notebook in a given directory</dt><dd>Simply follow
the following steps:
<ol class="org-ol">
<li style="margin-bottom:0;">From the menu: <code>File -&gt; Open</code>. You're now in the Jupyter file manager.</li>
<li style="margin-bottom:0;">Navigate to the directory where you want your notebook to be created.</li>
<li style="margin-bottom:0;">Then from the top right button: <code>New -&gt; Notebook: Python 3</code>.</li>
<li style="margin-bottom:0;"><p>
Give your notebook a name from the menu: <code>File -&gt; Rename</code>.
</p>
<p>
N.B.: If you create a file by doing <code>File -&gt; New Notebook -&gt;
Python 3</code>, the new notebook will be created in the current
directory. Moving it afterward is possible but a bit cumbersome
(you'll have to go through the Jupyter file manager by following
the menu <code>File -&gt; Open</code>, then select it, <code>Shut</code> it <code>down</code>, and <code>Move</code>
and/or <code>Rename</code>).
</p></li>
</ol></dd>
<dt>Importing an already existing notebook</dt><dd>If your notebook is
already in your GitLab project, then simply synchronize by using
the <code>Git pull</code> button and use the <code>File -&gt; Open</code> menu. Otherwise,
imagine, you want to import the <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/src/Python3/challenger.ipynb">following notebook</a> from someone
else's repository to re-execute it.
<ol class="org-ol">
<li style="margin-bottom:0;">Download the file on your computer. E.g., for this <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/src/Python3/challenger.ipynb">GitLab hosted
notebook</a>, click on <code>Open raw</code> (a small <code>&lt;/&gt;</code> within a document icon)
and save (<code>Ctrl-S</code> on most browsers) the content (a long Json text
file).</li>
<li style="margin-bottom:0;">Open the Jupyter file manager from the menu <code>File -&gt; Open</code> and
navigate to the directory where you want to upload your notebook.</li>
<li style="margin-bottom:0;">Then from the top right button, <code>Upload</code> the previously downloaded
notebook and confirm the upload.</li>
<li style="margin-bottom:0;">Open the freshly uploaded notebook through the Jupyter file
manager.</li>
</ol></dd>
</dl>
</div>
</div>
<div id="outline-container-org6607533" class="outline-3">
<h3 id="org6607533">Running R and Python in the same notebook</h3>
<div class="outline-text-3" id="text-org6607533">
<p>
<code>rpy2</code> package allows to use both languages in the same notebook by:
</p>
<ol class="org-ol">
<li style="margin-bottom:0;"><p>
Loading <code>rpy2</code>:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python">%load_ext rpy2.ipython
</pre>
</div></li>
<li style="margin-bottom:0;"><p>
Using the <code>%R</code> Ipython magic:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python">%%R
summary(cars)
</pre>
</div>
<p>
Python objects can then even be passed to R as follows (assuming <code>df</code>
is a pandas dataframe):
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python">%%R -i df
plot(df)
</pre>
</div></li>
</ol>
<p>
Note that this <code>%%R</code> notation indicates that R should be used for the whole cell but
an other possibility is to use <code>%R</code> to have a single line of R within a
python cell.
</p>
</div>
</div>
<div id="outline-container-orgd909163" class="outline-3">
<h3 id="orgd909163">Other languages</h3>
<div class="outline-text-3" id="text-orgd909163">
<p>
Jupyter is not limited to Pytyhon and R. Many other languages are available:
<a href="https://github.com/jupyter/jupyter/wiki/Jupyter-kernels">https://github.com/jupyter/jupyter/wiki/Jupyter-kernels</a>, including
non-free languages like SAS, Mathematica, Matlab&#x2026; Note that the maturity of these kernels differs widely.
</p>
<p>
None of these other languages have been deployed in the context of our
MOOC but you may want to read the next sections to learn how
to set up your own Jupyter on your computer and benefit from these extensions.
</p>
<p>
Since the question was asked several times, if you really need to stay
with SAS, you should know that SAS can be used within Jupyter using
either the <a href="https://sassoftware.github.io/sas_kernel/">Python SASKernel</a> or the <a href="https://sassoftware.github.io/saspy/">Python SASPy</a> package (step by step
explanations about this are given <a href="https://app-learninglab.inria.fr/gitlab/85bc36e0a8096c618fbd5993d1cca191/mooc-rr/blob/master/documents/tuto_jupyter_windows/tuto_jupyter_windows.md">here</a>).
</p>
<p>
Since proprietary software such as SAS cannot easily be inspected, we discourage its use as it hinders reproducibility by
essence. But perfection does not exist anyway and using Jupyter
literate programming approach allied with systematic control version
and environment control will certainly help anyway.
</p>
</div>
</div>
</div>
<div id="outline-container-org9176019" class="outline-2">
<h2 id="org9176019">2. Installing and configuring Jupyter on your computer</h2>
<div class="outline-text-2" id="text-org9176019">
<p>
In this section, we explain how to set up a Jupyter environment on
your own computer similar to the one deployed for this MOOC.
</p>
<p>
Note that Jupyter notebooks are only a small part of the picture and
that Jupyter is now part of a bigger project: <a href="https://blog.jupyter.org/jupyterlab-is-ready-for-users-5a6f039b8906">JupyterLab</a>, which allows
you to mix various components (including notebooks) in your
browser. In the context of this MOOC, our time frame was too short to
benefit from JupyterLab which was still under active development. You may, however, prefer JupyterLab when doing an installation on your own computer.
</p>
</div>
<div id="outline-container-org51b3378" class="outline-3">
<h3 id="org51b3378">2.1 Installing Jupyter</h3>
<div class="outline-text-3" id="text-org51b3378">
<p>
Follow these instructions if you wish to have a Jupyter environment on
your own computer similar to the one we set up for this MOOC.
</p>
<p>
First, download and install the <a href="https://conda.io/miniconda.html">latest version of Miniconda</a>. We use
Miniconda version <code>4.5.4</code> and Python version <code>3.6</code> on our server.
</p>
<p>
Miniconda is a light version of Anaconda, which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.
</p>
<p>
Then download the <a href="https://gist.github.com/brospars/4671d9013f0d99e1c961482dab533c57">mooc<sub>rr</sub> environment file</a> and create the environment using conda:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">conda env create -f environment.yml
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Windows activate the environment</span>
activate mooc_rr
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Linux and MacOS activate the environment</span>
<span style="font-weight: bold;">source</span> activate mooc_rr
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Linux, MacOS and Windows: launch the notebook</span>
jupyter notebook
</pre>
</div>
</div>
</div>
<div id="outline-container-orgc0d2a71" class="outline-3">
<h3 id="orgc0d2a71">2.2 Making sure Jupyter allows you to use R</h3>
<div class="outline-text-3" id="text-orgc0d2a71">
<p>
The environment described in the last section should include R, but if
you proceeded otherwise and only have Python available in Jupyter, you
may want to read the following section.
</p>
</div>
<div id="outline-container-org608c96e" class="outline-4">
<h4 id="org608c96e">• Installing <a href="https://github.com/IRkernel/IRkernel">IRKernel</a> (R package)</h4>
<div class="outline-text-4" id="text-org608c96e">
<p>
Do the following in R console:
</p>
<p>
Install the <code>devtools</code> package:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R">install.packages(<span style="font-style: italic;">'devtools'</span>,dep=<span style="font-weight: bold; text-decoration: underline;">TRUE</span>)
</pre>
</div>
<p>
Define a proxy if needed:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R"><span style="font-weight: bold; text-decoration: underline;">library</span>(httr)
set_config(use_proxy(url=<span style="font-style: italic;">"proxy"</span>, port=80, username=<span style="font-style: italic;">"username"</span>, password=<span style="font-style: italic;">"password"</span>))
</pre>
</div>
<p>
Install the <code>IRkernel</code> package:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R">devtools::install_github(<span style="font-style: italic;">'IRkernel/IRkernel'</span>)
IRkernel::installspec() <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">to register the kernel in the current R installation</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-org076acd5" class="outline-4">
<h4 id="org076acd5">• Installing rpy2 (Python package)</h4>
<div class="outline-text-4" id="text-org076acd5">
<p>
On Linux, the rpy2 package is available in standard distributions
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">sudo apt-get install python3-rpy2 python3-tzlocal
</pre>
</div>
<p>
An alternative (not really recommended if the first one is available)
consists in going through the python package manager with
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python">pip3 install rpy2
</pre>
</div>
<p>
<b>Windows</b>
</p>
<p>
Download the <code>rpy2</code> <a href="https://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2">binary file</a> by choosing the right operating system.
</p>
<p>
Open a DOS console and type the following command:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">python -m pip install rpy2&#8209;2.9.4&#8209;cp37&#8209;cp37m&#8209;win_amd64.whl <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">adapt filename</span>
</pre>
</div>
<p>
Install also <code>tzlocal</code>:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">python -m pip install tzlocal
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-org524a8ac" class="outline-3">
<h3 id="org524a8ac">2.3 Additional tips</h3>
<div class="outline-text-3" id="text-org524a8ac">
</div>
<div id="outline-container-org814c076" class="outline-4">
<h4 id="org814c076">• Exporting a notebook</h4>
<div class="outline-text-4" id="text-org814c076">
<p>
Here is what we had to install on a recent Debian computer to make sure
the notebook export via LaTeX works:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">sudo apt-get install texlive-xetex wkhtmltopdf
</pre>
</div>
<p>
Obviously, you can convert to html or pdf using the using the <code>File &gt; Download as &gt; HTML</code> (or <code>PDF</code>) menu option. This can also be done from
the command line with the following command:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">ipython3 nbconvert --to pdf Untitled.ipynb
</pre>
</div>
<p>
If you want to use a specific style, then the <code>nbconvert</code> exporter
should be customized. This is discussed and demoed <a href="http://markus-beuckelmann.de/blog/customizing-nbconvert-pdf.html">here</a>. We encourage
you to simply read the <a href="https://nbconvert.readthedocs.io/en/latest/">doc of nbconvert</a>.
</p>
<p>
Instead of going directly through LaTeX and playing too much with the
<code>nbconvert</code> exporter, an other option consists in exporting to Markdown
and playing with <a href="https://pandoc.org/">pandoc</a>. Both approaches work, it's rather a matter of
taste.
</p>
<p>
<b>Windows</b>
</p>
<p>
Download and install MiKTeX from the <a href="https://miktex.org/download">MiKTeX webpage</a> by choosing the
right operating system. You will be prompted to install some specific
packages when exporting to pdf.
</p>
</div>
</div>
<div id="outline-container-orga5ae744" class="outline-4">
<h4 id="orga5ae744">• Improving notebook readability</h4>
<div class="outline-text-4" id="text-orga5ae744">
<p>
Here are a few extensions that can ease your life:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><p>
<a href="https://stackoverflow.com/questions/33159518/collapse-cell-in-jupyter-notebook">Code folding</a> to improve readability when browsing the notebook.
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip3 install jupyter_contrib_nbextensions
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">jupyter contrib nbextension install --user # not done yet</span>
</pre>
</div></li>
<li style="margin-bottom:0;"><p>
<a href="https://github.com/kirbs-/hide_code">Hiding code</a> to improve readability when exporting.
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-sh">sudo pip3 install hide_code
sudo jupyter-nbextension install --py hide_code
jupyter-nbextension enable --py hide_code
jupyter-serverextension enable --py hide_code
</pre>
</div></li>
</ul>
</div>
</div>
<div id="outline-container-orgab0846a" class="outline-4">
<h4 id="orgab0846a">• Interacting with GitLab and GitHub</h4>
<div class="outline-text-4" id="text-orgab0846a">
<p>
To ease your experience, we added pull/push buttons that allow
you to commit and sync with GitLab. This development was specific to
the MOOC but inspired from a previous <a href="https://github.com/Lab41/sunny-side-up">proof of concept</a>. We have
recently discovered that someone else developed about at the same time
a <a href="https://github.com/sat28/githubcommit">rather generic version of this Jupyter plugin</a>. Otherwise, remember
that it is very easy to insert a shell cell in Jupyter in which you
can easily issue git commands. This is how we work most of the time.
</p>
<p>
This being said, you may have noticed that Jupyter keeps a perfect
track of the sequence in which cells have been run by updating the
"output index". This is a very good property from the reproducibility
point of view but depending on your usage, you may find it a bit
painful when committing. Some people have thus developed <a href="https://gist.github.com/pbugnion/ea2797393033b54674af">specific git
hooks</a> to ignore these numbers when committing Jupyter notebooks. There
is a long an interesting discussion about various options on
<a href="https://stackoverflow.com/questions/18734739/using-ipython-notebooks-under-version-control">StackOverflow</a>.
</p>
<p>
For those who use <a href="https://blog.jupyter.org/jupyterlab-is-ready-for-users-5a6f039b8906">JupyterLab</a> rather than the plain Jupyter, a specific <a href="https://github.com/jupyterlab/jupyterlab-git">JupyterLab git plugin</a> has been developed to offer a nice version control experience.
</p>
</div>
</div>
</div>
</div>
</div>
<div id="content">
<h1 class="title">Maintaining a journal</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orgc95598e">Some examples of LabBooks provided for inspiration</a></li>
<li style="margin-bottom:0;"><a href="#org51674f0">How to report efficiently (by Martin Quinson)</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org976cc9a">Reporting</a></li>
<li style="margin-bottom:0;"><a href="#org5ff6009">Reporting Logistics</a></li>
<li style="margin-bottom:0;"><a href="#org50f88a2">Reporting Document Organization</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-orgc95598e" class="outline-2">
<h2 id="orgc95598e">Some examples of LabBooks provided for inspiration</h2>
<div class="outline-text-2" id="text-orgc95598e">
<p>
Since a few years, we systematically require any or our students to
have a laboratory notebook in org-mode. Most of the time, they start
in private repositories but often end up being fully opened. Here are
a few ones:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">Luka Stanisic (a former PhD student advised by Arnaud Legrand) starting
using this methodology during his Msc and developed further
throughout his PhD. Part of his <a href="http://mescal.imag.fr/membres/luka.stanisic/thesis/thesis.pdf">PhD thesis</a> was actually about
designing a methodology for reproducible experiments in large scale
distributed systems. You may want to have a look at <a href="http://starpu-simgrid.gforge.inria.fr/">his postdoc
LabBook</a> and to the <a href="https://framagit.org/lvgx/pfe/blob/master/doc/labbook.org">report of Léo Villeveygoux</a> whom he advised.</li>
<li style="margin-bottom:0;">Tom Cornebize is currently a PhD student advised by Arnaud Legrand
and during his MSc, he also heavily <a href="https://github.com/Ezibenroc/simulating_mpi_applications_at_scale">loged his activity on Github</a>.</li>
<li style="margin-bottom:0;"><a href="https://github.com/schnorr">Lucas Schnorr</a>'s students usually also maintain their journal in a
very nice way: <a href="https://github.com/taisbellini/aiyra/blob/master/LabBook.org">Tais Bellini's BSc.</a>, <a href="https://github.com/mittmann/hpc/blob/master/LabBook.org">Arthur Krause's LabBook</a>, <a href="http://www.inf.ufrgs.br/~llnesi/memory_report/MemoryReport.html">Luca
Nesi's LabBook</a>.</li>
<li style="margin-bottom:0;"><a href="https://people.irisa.fr/Martin.Quinson/Research/Students/Methodo/">Martin Quinson</a>'s students also follow such conventions:
<ul class="org-ul">
<li style="margin-bottom:0;">Ezequiel Torti Lopez, M2R 2014. <a href="https://github.com/mquinson/simgrid-simpar/blob/master/report.org">Report</a>, with both the data
provenance and the data analysis included in the appendix.</li>
<li style="margin-bottom:0;">Betsegaw Lemma, M2R 2017. <a href="https://github.com/betsegawlemma/internship/blob/master/intern_report.org">LabBook</a></li>
<li style="margin-bottom:0;">Gabriel Corona, engineer on SimGrid, 2015-2016. <a href="https://github.com/randomstuff/simgrid-journal/blob/master/journal.org">Journal</a>, <a href="http://www.gabriel.urdhr.fr/tags/simgrid/">Blog (findings)</a>.</li>
<li style="margin-bottom:0;">Matthieu Nicolas, engineer on PLM, 2014-2016, <a href="https://github.com/MatthieuNICOLAS/PLM-reporting/blob/master/activity-report.org">Journal</a>.</li>
</ul></li>
</ul>
<p>
Org-mode is obviously not the only option and many of our students use
am mixture of org-mode, rstudio and jupyter depending on what is more
convenient.
</p>
</div>
</div>
<div id="outline-container-org51674f0" class="outline-2">
<h2 id="org51674f0">How to report efficiently (by Martin Quinson)</h2>
<div class="outline-text-2" id="text-org51674f0">
<p>
My friend Martin has gathered <a href="https://people.irisa.fr/Martin.Quinson/Research/Students/Methodo/">an excellent compendium of information
and references on his webpage to explain his students what he expects
from them</a>. <b>I'll therefore simply paraphrase him here</b> with the most
important aspects related to reporting but feel free to read <a href="https://people.irisa.fr/Martin.Quinson/Research/Students/Methodo/">the
original version</a>:
</p>
</div>
<div id="outline-container-org976cc9a" class="outline-3">
<h3 id="org976cc9a">Reporting</h3>
<div class="outline-text-3" id="text-org976cc9a">
<p>
I ask you to write a little reporting regularly. Depending on the
situation, it may be every day, every week or every month. In any
case, your reporting is very important for the following reasons:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">It forces you to think about what you are doing, which may help you
to unblock your problem by your own. Writing down the problems in a
clear way is often sufficient to see the solution appearing.</li>
<li style="margin-bottom:0;">It helps me following your progress even between the meetings. I
cannot unblock you if I don't detect that you are on a wrong lead or
otherwise blocked.</li>
<li style="margin-bottom:0;">It keeps a track of the steps in your work. That's good for the day
where you want to write your final report (even if a final report
should never be presented in the chronological order). That's good
for the next after you who will be supposed to continue you effort,
or to build upon it.</li>
<li style="margin-bottom:0;">That person may be yourself (if you go for a PhD program), another
intern, myself or even someone else on the Internet: that's what we
call Open Science, an effort where everyone can build upon the
scientific work of everyone.</li>
</ul>
<p>
I want you to write your reporting in an org file (yep, you don't have
a choice here). [..]
</p>
</div>
</div>
<div id="outline-container-org5ff6009" class="outline-3">
<h3 id="org5ff6009">Reporting Logistics</h3>
<div class="outline-text-3" id="text-org5ff6009">
<p>
Once you're setup with all software installed and somehow configured,
you need to create a reporting file in a place where I can see it and
where it won't get lost if your disk crashes or something. Open a
dedicated git repository (on github, gitorious, gitlab, &#x2026;) for
that. After your internship, your report should be archived directly
in the source tree of the software that you are working on, if
any. But having your reporting located in the source tree may
complicate things during your work.
</p>
<p>
Yes, it means that your file will be public at some point, but that's
why we call it "Open Science", after all. Also, you should write it in
English if possible. The part of your reporting that is called
"Journal" (see below) may be written in French if you are more
efficient this way but the rest must be in English. Don't make your
tone too formal because the file is public. Make it efficient. Nobody
will ever blame you for the work you did during an internship a long
time ago. If you really want, we can even make this file
anonymous. Just speak to me.
</p>
<p>
You want to write your reporting before leaving work. Weekly reporting
should be written on Friday, one or two hours before leaving. That's
the best solution to have a nice week end without thinking about work,
and still lose no information that you would need on Monday morning.
</p>
</div>
</div>
<div id="outline-container-org50f88a2" class="outline-3">
<h3 id="org50f88a2">Reporting Document Organization</h3>
<div class="outline-text-3" id="text-org50f88a2">
<p>
Your reporting document should have four main parts:
</p>
<dl class="org-dl">
<dt>Findings</dt><dd>This section summarizes the general information that you
gathered during your work. It is empty at the beginning
of your internship, and gets fleshed with the important
things that you find on your way. That's where
bibliographical information go, for example. But that's
definitely not where TODO notes go (see below).</dd>
<dt>Development</dt><dd>This section presents the technical sides of your
work. Don't write anything in there yet. Put it all
in the Journal part for now.</dd>
<dt>Journal</dt><dd>Describe the day-to-day work done for each period (day,
week or month) of your internship. That's the most
important part of your reporting, and we come back to it
below.</dd>
<dt>Conclusion</dt><dd><p>
That's what you write in the next week of your
internship. You can see it as a letter to the next
guy, explaining the current state of your work, a few
words about its technical organization, and what
should be done next on that topic. Keep this part
highly technical, the overall organization of your
internship will be seen in your final report.
</p>
<p>
The Journal part is the only part that you may write
in French on need. You want to add one subsection per
period to your journal. Don't make it too long, or you
would waste time writing long texts that very few will
ever read. Don't make it too short or it will be
impossible to understand it on Monday morning (or
three months after). Finding the good balance is
sometimes difficult, but I will provide feedback on
your first entries, so don't worry.
</p></dd>
</dl>
<p>
Each of section describing a period should contain three subsubsections:
</p>
<dl class="org-dl">
<dt>Things done</dt><dd>a few words about what you've done. Something like 2
or 4 items with a few words describing what you've
done. You can omit the title of that section and put
the items directly in the upper section (see the
example below).</dd>
<dt>Blocking points and questions</dt><dd>try to explain clearly the things
that block you or slow you down. If you found the solution
already, then it should be part of the previous subsection (but
you should say a few words nevertheless). Also ask every question
that you may have for me in that section. If the question are
personal (e.g., about the logistics of your internship such as
salary or so), please prefer emails that are not publicly
visible. If this section is empty for a given period, skip it
all together (no empty subsubsections).</dd>
<dt>Planned work</dt><dd>A few items about what you plan to work on during
the next period.</dd>
</dl>
<p>
A template of reporting file is given at the end of this section. This
is just a strong advice: If you really feel better with another file
organization, then give it a try for one period, and ask for my
feedback. I can adapt, and I do not pretend that my advice is the
definitive answer. It's just the result of my experience so far.
</p>
<p>
Notice how TODO items are written: they are given as items in the
Planned work sections of the journal. As explained in the
<a href="http://orgmode.org/manual/Checkboxes.html">documentation</a>, you simply have to write "[ ]" in front of items that
you plan to do in the future.
</p>
<p>
You should add a <code>[1/]</code> on the "Planned work" line, so that emacs keeps
track of what is done and what is still to do. Once they are done, you
type C-c C-C on their lines to change the blank box [ ] into a checked
box [X]. Also, the <code>[1/]</code> will be changed to denote the amount of work
that is still to be done.
</p>
<p>
At any point, you can see all ongoing TODO items with the following
keystrokes: "C-c / t". More information on TODOs in orgmode's
<a href="http://orgmode.org/manual/TODO-basics.html">documentation</a>. The important thing here is that most TODO items must
only be written in the <i>Journal</i> part (so that we know when they
occurred).
</p>
<p>
<b>Do not edit past entries of your journal</b>, unless you have very good
reasons. If you must, make sure that you don't lose information about
the path that you took (remember the Open Science thingy). You should
always <b>add</b> information to past entries, such as:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">- *edit* This hypothesis does not hold; see the entry of [the day where you found it] for more information.
</pre>
</div>
<p>
The only exception are TODO entries, that should clearly be rewritten
to DONE entries. If you need to adapt your TODO entry (because the
initial goal was poorly stated or otherwise), change the initial entry
from TODO to CANCELED (or check the box after stating in a subitem
that it was not done but canceled, and why), and create a new TODO
entry in the current period section.
</p>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
* Introduction
This file contains the reporting for my beloved internship done on
this topic on that year. For now, just add the official title of
your internship (check the convention signed between your
university and my lab). After a few weeks, once you really
understand your internship, you should write a few paragraphs about
the context, problem and motivation of your work, with some
possible use cases. But don't do that right now.
* Bibliography
* Journal
** Week 2 feb
- read the doc about writing my reporting
*** Questions
- do I really have to use emacs?
*** Work Planed [1/2]
- [X] install emacs and setup orgmode
- [ ] read the provided articles
** Week 9 feb
- Installed emacs
(omit the Questions section if no question)
*** Work Planed
- do some useful work
</pre>
</div>
</div>
</div>
</div>
This source diff could not be displayed because it is too large. You can view the blob instead.
<div id="outline-container-orgc50e61e" class="outline-2">
<h2 id="orgc50e61e">Exercice 1 : Ré-exécuter n'est pas répliquer&#x2026;</h2>
<div class="outline-text-2" id="text-orgc50e61e">
<p>
Même si la terminologie peut varier d'un auteur ou d'une communauté à
l'autre, il est important de comprendre que l'on peut distinguer
différents niveaux de "réplication" selon que l'on s'est contenté de
vérifier que l'on pouvait ré-exécuter le code et obtenir exactement les
mêmes résultats ou bien que l'on arrivait à reproduire des résultats
similaires en suivant une approche similaire (éventuellement avec un
autre langage, une autre méthode de calcul, etc.). À ce sujet, vous
pourrez vouloir par exemple lire <a href="https://arxiv.org/abs/1708.08205">https://arxiv.org/abs/1708.08205</a>.
</p>
<p>
Le diable se cache souvent dans des endroits auxquels on n'aurait jamais
pensé et nous sommes nous-mêmes allés de surprise en surprise en
préparant ce MOOC, notamment avec l'exercice du module 2 sur
Challenger. C'est pourquoi nous vous proposons dans cet exercice, de
refaire une partie de l'analyse des données de Challenger, comme l'ont
fait Siddhartha Dallal et ses co-auteurs il y a presque 30 ans dans
leur article <i>Risk Analysis of the Space Shuttle: Pre-Challenger
Prediction of Failure</i> et publié dans le <i>Journal of the American
Statistical Association</i> (Vol. 84, No. 408, Déc., 1989) mais dans un autre langage de votre choix (Python, R, Julia, SAS&#x2026;).
</p>
<p>
Nous savons d'expérience que si les estimations de pente et
d'intercept sont généralement les mêmes, on peut avoir des différences
lorsque l'on regarde les estimateurs de variance et le R<sup>2</sup> un peu plus
dans les détails. Il peut également y avoir des surprises dans le
graphique final selon les versions de bibliothèques utilisées.
</p>
<p>
L'ensemble des calculs à effectuer est décrit ici avec les
indications sur comment contribuer :
<a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/">https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/</a>
</p>
<p>
Vous y trouverez notre réplication des calculs de Dallal <i>et al.</i> (en
R), une mise en œuvre en Python et une en R (très similaires à ce que
vous avez pu utiliser dans le module 2). Cet exercice peut donc se
faire à deux niveaux :
</p>
<ol class="org-ol">
<li>un niveau facile pour ceux qui repartiront du code dans le langage
qu'ils n'auront initialement pas utilisé et se contenteront de le
ré-exécuter. Pour cela, nul besoin de maîtriser la régression
logistique, il suffit de bien inspecter les sorties produites et de
vérifier qu'elles correspondent bien aux valeurs attendues. Pour
ceux qui ré-exécuteraient le notebook Python dans l'environnement
Jupyter du MOOC, n'hésitez pas à consulter <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/4ab5bb42ca1e45c8b0f349751b96d405">les ressources de la
section 4A du module 2</a> qui expliquent comment y importer un
notebook.</li>
<li>un niveau plus difficile pour ceux qui souhaiteront le réécrire
complètement (éventuellement dans un autre langage que R ou Python,
l'expérience peut être d'autant plus intéressante que nous n'avons
pas testé ces variations). Là, si les fonctions de calcul d'une
régression logistique ne sont pas présentes, il y a par contre
intérêt à en savoir un minimum pour pouvoir les
implémenter. L'exercice en est d'autant plus instructif.</li>
</ol>
<p>
Vous pourrez alors discuter sur le forum des succès et des échecs que
vous aurez pu rencontrer. Pour cela :
</p>
<ul class="org-ul">
<li><b>Vous publierez auparavant dans votre dépôt les différents notebooks</b>
en prenant bien soin d'enrichir votre document des informations
(numéros de version, etc.) sur votre système et sur les
bibliothèques installées.</li>
<li>Vous indiquerez votre résultat (que ça soit un succès ou échec à
obtenir les mêmes résultats) en <b>remplissant ce <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md">tableau</a></b> (vous avez
les droits d'édition donc il vous suffit d'éditer les fichiers via
l'interface GitLab). Vous vérifierez les valeurs obtenues pour :
<ol class="org-ol">
<li>les coefficients de la pente et de l'intercept</li>
<li>les estimations d'erreur de ces coefficients</li>
<li>le goodness of fit</li>
<li>la figure</li>
<li>la zone de confiance</li>
</ol></li>
<li><p>
Pour chacun vous indiquerez si le résultat est :
</p>
<ul class="org-ul">
<li>identique</li>
<li>proche à moins de trois décimales</li>
<li>très différent</li>
<li>non fonctionnel (pas de résultat obtenu)</li>
</ul>
<p>
Vous indiquerez également dans ce tableau :
</p>
<ul class="org-ul">
<li>un lien vers votre espace gitlab contenant les différents notebooks</li>
<li>le nom du système d'exploitation utilisé</li>
<li>le langage utilisé et son numéro de version</li>
<li>les numéros des principales bibliothèques utilisées
<ul class="org-ul">
<li>Python : numpy, pandas, matplotlib, statsmodels&#x2026;</li>
<li>R : BLAS, ggplot, dplyr si chargées</li>
</ul></li>
</ul></li>
</ul>
<p>
Ne vous inquiétez pas si ces consignes vous semblent peu claires sur l'instant,
elles sont rappelées en haut du <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md">tableau</a> et vous vous rendrez vite
compte s'il vous manque quelque chose quand vous essaierez de remplir
ce tableau.
</p>
<p>
Nous effectuerons une synthèse illustrant les principales divergences
observées et nous vous l'enverrons à la fin du MOOC.
</p>
</div>
</div>
<div id="outline-container-org55f722b" class="outline-2">
<h2 id="org55f722b" style="color: #b62567;">Re-execution is not replication&#x2026;</h2>
<div class="outline-text-2" id="text-org55f722b">
<p style="color: #b62567;">
Unfortunately terminology varies a lot between authors and
communities, but it is important to understand the distinction between
different levels of "replication". You can be satisfied with
re-running the code and get exactly the same results, but you can also
try to obtain similar results using a similar approach, changing for
example the programming language, computational method, etc. An
article we recommend on this topic is
<a href="https://arxiv.org/abs/1708.08205">https://arxiv.org/abs/1708.08205</a>.
</p>
<p style="color: #b62567;">
Often the devil is in the details that one would have never thought
about, and we have had our share of surprises while preparing this
MOOC, in particular with the exercise on the Challenger catastrophe
from module 2. We therefore propose in this exercise that you re-do a
part of this analysis, following the example of Siddhartha Dallal and
co-authors almost 30 years ago in their article <i>Risk Analysis of the
Space Shuttle: Pre-Challenger Prediction of Failure</i>, published in the
<i>Journal of the American Statistical Association</i> (Vol. 84, No. 408,
Déc., 1989), but using a different language of your choosing (Python,
R, Julia, SAS&#x2026;).
</p>
<p style="color: #b62567;">
Our experience shows that the estimations of slope and intercept are
generally the same, but there can be differences when looking at
variance estimators and R<sup>2</sup> in more detail. Another source of
surprises is the final graphical presentation, depending on the
versions of the libraries that are used.
</p>
<p style="color: #b62567;">
The computations to be done are described at
<a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/">https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/</a>
together with instructions for contributing.
</p>
<p style="color: #b62567;">
You will find there our replications of the computations by Dallal <i>et
al.</i> (in R), one in Python and one in R (very similar to what you have
used in module 2). This exercise can be done at two levels:
</p>
<ol class="org-ol">
<li style="color: #b62567;">an easy level at which you start from the code in the language that you did not use initially, and content yourself with re-executin it. This doesn't require mastering logistic regression, it is sufficien to inspect the outputs produced and check that they correspond to the expected values. For those who want to re-execute the Python notebook in our MOOC's Jupyter environment, check <a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/4ab5bb42ca1e45c8b0f349751b96d405">the resources for sequence 4A of module 2</a> that explain how to import a notebook.</li>
<li style="color: #b62567;">a more difficult level at which you rewrite the analysis completely, possibly in a different language than Python or R, which makes the exercise more interesting because we have not tested such variants. If logistic regression is not already implemented for your language, you will need a good understanding of it in order to write the code yourself, which of course makes the exercise even more instructive.</li>
</ol>
<p style="color: #b62567;">
You can discuss your successes or failures on the forum, after following these instructions:
</p>
<ul class="org-ul">
<li style="color: #b62567;"><b>First, publish your notebooks in your repository</b>, taking care to enrich your document with information about your system and your libraries (version numbers etc.).</li>
<li style="color: #b62567;">Indicate your result by adding to this <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md">table</a> (you have write permissions, so you can simply edit it via the GitLab interface). Check the values obtained for:
<ol class="org-ol">
<li style="color: #b62567;">the slope and intercept coefficients</li>
<li style="color: #b62567;">the error estimates for these coefficients</li>
<li style="color: #b62567;">the goodness of fit</li>
<li style="color: #b62567;">the plot</li>
<li style="color: #b62567;">the confidence region</li>
</ol></li>
<li><p style="color: #b62567;">
For each of these values, specify if your result is
</p>
<ul class="org-ul">
<li style="color: #b62567;">identical</li>
<li style="color: #b62567;">close, to three decimal places</li>
<li style="color: #b62567;">very different</li>
<li style="color: #b62567;">non functional (no result obtained)</li>
</ul>
<p style="color: #b62567;">
Also provide in this table:
</p>
<ul class="org-ul">
<li style="color: #b62567;">a link to your GitLab workspace with your notebook(s)</li>
<li style="color: #b62567;">your operating system</li>
<li style="color: #b62567;">the language you used, with the version number</li>
<li style="color: #b62567;">version numbers for the main libraries
<ul class="org-ul">
<li style="color: #b62567;">Python: numpy, pandas, matplotlib, statsmodels&#x2026;</li>
<li style="color: #b62567;">R: BLAS, ggplot, dplyr if used</li>
</ul></li>
</ul></li>
</ul>
<p style="color: #b62567;">
Don't worry if these instructions seem confusing, they are reproduced above the <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md">table</a> and you will quickly notice if something is missing when you try to add your data.
</p>
<p style="color: #b62567;">
We will compile a synthesis of the principal divergences observes and make it available at the end of the MOOC.
</p>
</div>
</div>
</div>
<h2 id="org1f802ba">Exercice 2 : L'importance de l'environnement</h2>
<div class="outline-text-2" id="text-org1f802ba">
<p>
Dans cet exercice, nous vous proposons de reprendre l'exercice
précédent mais en mettant à jour l'environnement de calcul. En effet,
nous avons rencontré des surprises en préparant ce MOOC puisqu'il nous
est arrivé d'avoir des résultats différents entre nos machines et
l'environnement Jupyter que nous avions mis en place pour le MOOC. Ça
sera peut-être également votre cas !
</p>
<ol class="org-ol">
<li>Pour ceux qui ont suivi le parcours Jupyter, recréez
l'environnement du MOOC sur votre propre machine en suivant les
instructions données
<a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/4ab5bb42ca1e45c8b0f349751b96d405">dans les ressources de la section 4A du module 2</a>.</li>
<li>Vérifiez si vous obtenez bien les mêmes résultats que ceux
attendus.</li>
<li>Mettez à jour (vers le haut ou vers la bas) cet environnement et
vérifiez si vous obtenez les mêmes résultats.</li>
</ol>
<p>
Comme précédemment, vous mettrez à jour le <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md">tableau</a> et vous discuterez
sur le forum des succès et des échecs que vous aurez rencontrés.
</p>
</div>
</div>
<div id="outline-container-org1a24dbb" class="outline-2">
<h2 id="org1a24dbb"><span style="color: #b62567;">The importance of the environment</span></h2>
<div class="outline-text-2" id="text-org1a24dbb">
<p style="color: #b62567;">
In this exercise, we ask you to redo the preceding exercise after
updating the computational environment. When preparing this MOOC, we
had a few surprises due to different results on our own computers and
on the Jupyter environment that we had installed for the MOOC. Maybe
that will happen to you as well!
</p>
<ol class="org-ol">
<li style="color: #b62567;">For those you followed the Jupyter path, re-create the MOOC's Jupyter environment on your own computer by following the instructions given
<a href="https://www.fun-mooc.fr/courses/course-v1:inria+41016+session01bis/jump_to_id/4ab5bb42ca1e45c8b0f349751b96d405">in the resource section of sequence 4A of module 2</a>.</li>
<li style="color: #b62567;">Check if you get the same results as in the MOOC environment.</li>
<li style="color: #b62567;">Update this environment, increasing or decreasing some package's version numbers, and check if the results are still the same.</li>
</ol>
<p style="color: #b62567;">
As before, you can add your observations to the <a href="https://app-learninglab.inria.fr/gitlab/moocrr-session1/moocrr-reproducibility-study/blob/master/results.md">table</a> and discuss your successes and failures on the forum.
</p>
</div>
<div id="outline-container-org5b10dc4" class="outline-2">
<h2 id="org5b10dc4">Exercice 3 : Répliquer un papier de ReScience</h2>
<div class="outline-text-2" id="text-org5b10dc4">
<p>
ReScience (<a href="http://rescience.github.io/">http://rescience.github.io/</a>) est un journal de sciences
computationnelles entièrement ouvert dont l'objectif est d'encourager
la réplication de travaux déjà publiés en s'assurant que l'ensemble du
code et des données soit disponible. Pour chacun des articles publiés
dans ReScience, nous avons la garantie qu'au moins two chercheurs
indépendants ont réussi à suivre les indications, à ré-exécuter le
code et à ré-obtenir les mêmes résultats que ceux décrits par les
auteurs. Cela ne veut pas dire que cela soit parfaitement automatique
pour autant et il peut être intéressant de voir comment ils ont
procédé.
</p>
<p>
Nous vous proposons donc de choisir l'un de ces articles (celui avec
lequel vous avez le plus d'affinité) et d'essayer de réexécuter les
codes et les calculs décrits dans l'article. N'hésitez pas à indiquer
vos difficultés éventuelles sur le forum où nous répondrons à vos questions.
</p>
</div>
</div>
<div id="outline-container-org60c7839" class="outline-2">
<h2 id="org60c7839" style="color: #b62567;">Replicate a paper from ReScience</h2>
<div class="outline-text-2" id="text-org60c7839">
<p style="color: #b62567;">
ReScience (<a href="http://rescience.github.io/">http://rescience.github.io/</a>) is a scientific journal for
computational science that is completely open and has the goal of
encouraging the replication of already published work while providing
a complete set of code and data. For each article published in
ReScience, we know that at least two independent researchers (the
reviewers) have been able to follow the instructions, re-execute the
code, and obtain the same results as those described by the
authors. This doesn't mean that the process is fully automatic, and
therefore it is of interest to see how they have proceeded.
</p>
<p style="color: #b62567;">
We ask you to choose one of the articles (the one that you like most)
and to try to re-execute the code as described in the article. Don't
hesitate to indicate any difficulties you might encounter in the
forum, where we will reply to your questions&#x2026;
</p>
</div>
<div id="content">
<h1 class="title">Tracking environment information</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org54d219c">Getting information about your Git repository</a></li>
<li style="margin-bottom:0;"><a href="#orgd1774c3">Getting information about Python(3) libraries</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org7283a87">Getting information about your system</a></li>
<li style="margin-bottom:0;"><a href="#orgfa4dc3a">Getting the list of installed packages and their version</a></li>
<li style="margin-bottom:0;"><a href="#org31cde5f">How to list imported modules?</a></li>
<li style="margin-bottom:0;"><a href="#orgcea179c">Saving and restoring an environment with pip</a></li>
<li style="margin-bottom:0;"><a href="#org849fdbb">Installing a new package or a specific version</a></li>
</ul>
</li>
<li style="margin-bottom:0;"><a href="#org4f45b1e">Getting information about R libraries</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orgc583ee9">Getting the list imported modules and their version</a></li>
<li style="margin-bottom:0;"><a href="#orgdffc6a5">Getting the list of installed packages and their version</a></li>
<li style="margin-bottom:0;"><a href="#orgb52d0ce">Installing a new package or a specific version</a>
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#orgaf558a0">Installing a pre-compiled version</a></li>
<li style="margin-bottom:0;"><a href="#org7d8a9f0">Using devtools</a></li>
<li style="margin-bottom:0;"><a href="#org4509fba">Installing from source code</a></li>
<li style="margin-bottom:0;"><a href="#org9d64d25">Potential issues</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-org54d219c" class="outline-2">
<h2 id="org54d219c">Getting information about your Git repository</h2>
<div class="outline-text-2" id="text-org54d219c">
<p>
When taking notes, it may be difficult to remember which version of
the code or of a file was used. This is what version control is useful
for. Here are a few useful commands that we typically insert at the
top of our notebooks in shell cells
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">git log -1
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
commit 741b0088af5b40588493c23c46d6bab5d0adeb33
Author: Arnaud Legrand &lt;arnaud.legrand@imag.fr&gt;
Date: Tue Sep 4 12:45:43 2018 +0200
Fix a few typos and provide information on jupyter-git plugins.
</pre>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">git status -u
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
On branch master
Your branch is ahead of 'origin/master' by 4 commits.
(use "git push" to publish your local commits)
Changes not staged for commit:
(use "git add &lt;file&gt;..." to update what will be committed)
(use "git checkout -- &lt;file&gt;..." to discard changes in working directory)
modified: resources.org
Untracked files:
(use "git add &lt;file&gt;..." to include in what will be committed)
../../module2/ressources/replicable_article/IEEEtran.bst
../../module2/ressources/replicable_article/IEEEtran.cls
../../module2/ressources/replicable_article/article.bbl
../../module2/ressources/replicable_article/article.tex
../../module2/ressources/replicable_article/data.csv
../../module2/ressources/replicable_article/figure.pdf
../../module2/ressources/replicable_article/logo.png
.#resources.org
no changes added to commit (use "git add" and/or "git commit -a")
</pre>
<p>
<i>Note: the -u indicates that git should also display the contents of
new directories it did not previously know about.</i>
</p>
<p>
Then, we often include commands at the end of our notebook indicating
how to commit the results (adding the new files, committing with a
clear message and pushing). E.g.,
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">git add resources.org;
git commit -m <span style="font-style: italic;">"Completing the section on getting Git information"</span>
git push
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
[master 514fe2c1 ] Completing the section on getting Git information
1 file changed, 61 insertions(+)
Counting objects: 25, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (25/25), 7.31 KiB | 499.00 KiB/s, done.
Total 25 (delta 11), reused 0 (delta 0)
To ssh://app-learninglab.inria.fr:9418/learning-lab/mooc-rr-ressources.git
6359f8c..1f8a567 master -&gt; master
</pre>
<p>
Obviously, in this case you need to save the notebook before running
this cell, hence the output of this final command (with the new git
hash) will not be stored in the cell. This is not really a problem and
is the price to pay for running git from within the notebook itself.
</p>
</div>
</div>
<div id="outline-container-orgd1774c3" class="outline-2">
<h2 id="orgd1774c3">Getting information about Python(3) libraries</h2>
<div class="outline-text-2" id="text-orgd1774c3">
</div>
<div id="outline-container-org7283a87" class="outline-3">
<h3 id="org7283a87">Getting information about your system</h3>
<div class="outline-text-3" id="text-org7283a87">
<p>
This topic is discussed on <a href="https://stackoverflow.com/questions/3103178/how-to-get-the-system-info-with-python">StackOverflow</a>.
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python"><span style="font-weight: bold;">import</span> platform
<span style="font-weight: bold;">print</span>(platform.uname())
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
uname_result(system='Linux', node='icarus', release='4.15.0-2-amd64', version='#1 SMP Debian 4.15.11-1 (2018-03-20)', machine='x86_64', processor='')
</pre>
</div>
</div>
<div id="outline-container-orgfa4dc3a" class="outline-3">
<h3 id="orgfa4dc3a">Getting the list of installed packages and their version</h3>
<div class="outline-text-3" id="text-orgfa4dc3a">
<p>
This topic is discussed on <a href="https://stackoverflow.com/questions/20180543/how-to-check-version-of-python-modules">StackOverflow</a>. When using <code>pip</code> (the Python
package installer) within a shell command, it is easy to query the
version of all installed packages (note that on your system, you may
have to use either <code>pip</code> or <code>pip3</code> depending on how it is named and which
versions of Python are available on your machine
</p>
<p>
Here is for example how I get this information on my machine:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip3 freeze
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
asn1crypto==0.24.0
attrs==17.4.0
bcrypt==3.1.4
beautifulsoup4==4.6.0
bleach==2.1.3
...
pandas==0.22.0
pandocfilters==1.4.2
paramiko==2.4.0
patsy==0.5.0
pexpect==4.2.1
...
traitlets==4.3.2
tzlocal==1.5.1
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5
</pre>
<p>
In a Jupyter notebook, this can easily be done by using the <code>%%sh</code>
magic. Here is for example what you could do and get on the Jupyter
notebooks we deployed for the MOOC (note that here, you should simply
use the <code>pip</code> command):
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python">%%sh
pip freeze
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
alembic==0.9.9
asn1crypto==0.24.0
attrs==18.1.0
Automat==0.0.0
...
numpy==1.13.3
olefile==0.45.1
packaging==17.1
pamela==0.3.0
pandas==0.22.0
...
webencodings==0.5
widgetsnbextension==3.2.1
xlrd==1.1.0
zope.interface==4.5.0
</pre>
<p>
In the rest of this document, I will assume the correct command is <code>pip</code>
and I will not systematically insert the <code>%%sh</code> magic.
</p>
<p>
Once you know which packages are installed, you can easily get
additional information about a given package and in particular check
whether it was installed "locally" through pip or whether it is
installed system-wide. Again, in a shell command:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip show pandas
<span style="font-weight: bold;">echo</span> <span style="font-style: italic;">" "</span>
pip show statsmodels
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
Name: pandas
Version: 0.22.0
Summary: Powerful data structures for data analysis, time series,and statistics
Home-page: http://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /usr/lib/python3/dist-packages
Requires:
Name: statsmodels
Version: 0.9.0
Summary: Statistical computations and models for Python
Home-page: http://www.statsmodels.org/
Author: None
Author-email: None
License: BSD License
Location: /home/alegrand/.local/lib/python3.6/site-packages
Requires: patsy, pandas
</pre>
</div>
</div>
<div id="outline-container-org31cde5f" class="outline-3">
<h3 id="org31cde5f">How to list imported modules?</h3>
<div class="outline-text-3" id="text-org31cde5f">
<p>
Without resorting to pip (that will list all available packages), you
may want to know which modules are loaded in a Python session as well
as their version. Inspired by <a href="https://stackoverflow.com/questions/4858100/how-to-list-imported-modules">StackOverflow</a>, here is a simple
function that lists loaded package (that have a <code>__version__</code> attribute,
which is unfortunately not completely standard).
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-python"><span style="font-weight: bold;">def</span> <span style="font-weight: bold;">print_imported_modules</span>():
<span style="font-weight: bold;">import</span> sys
<span style="font-weight: bold;">for</span> name, val <span style="font-weight: bold;">in</span> <span style="font-weight: bold;">sorted</span>(sys.modules.items()):
<span style="font-weight: bold;">if</span>(<span style="font-weight: bold;">hasattr</span>(val, <span style="font-style: italic;">'__version__'</span>)):
<span style="font-weight: bold;">print</span>(val.<span style="font-weight: bold;">__name__</span>, val.__version__)
<span style="font-weight: bold;">else</span>:
<span style="font-weight: bold;">print</span>(val.<span style="font-weight: bold;">__name__</span>, <span style="font-style: italic;">"(unknown version)"</span>)
<span style="font-weight: bold;">print</span>(<span style="font-style: italic;">"**** Package list in the beginning ****"</span>);
print_imported_modules()
<span style="font-weight: bold;">print</span>(<span style="font-style: italic;">"**** Package list after loading pandas ****"</span>);
<span style="font-weight: bold;">import</span> pandas
print_imported_modules()
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
**** Package list in the beginning ****
**** Package list after loading pandas ****
_csv 1.0
_ctypes 1.1.0
decimal 1.70
argparse 1.1
csv 1.0
ctypes 1.1.0
cycler 0.10.0
dateutil 2.7.3
decimal 1.70
distutils 3.6.5rc1
ipaddress 1.0
json 2.0.9
logging 0.5.1.2
matplotlib 2.1.1
numpy 1.14.5
numpy.core 1.14.5
numpy.core.multiarray 3.1
numpy.core.umath b'0.4.0'
numpy.lib 1.14.5
numpy.linalg._umath_linalg b'0.1.5'
pandas 0.22.0
_libjson 1.33
platform 1.0.8
pyparsing 2.2.0
pytz 2018.5
re 2.2.1
six 1.11.0
urllib.request 3.6
zlib 1.0
</pre>
</div>
</div>
<div id="outline-container-orgcea179c" class="outline-3">
<h3 id="orgcea179c">Saving and restoring an environment with pip</h3>
<div class="outline-text-3" id="text-orgcea179c">
<p>
The easiest way to go is as follows:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip3 freeze &gt; requirements.txt <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">to obtain the list of packages with their version</span>
pip3 install -r requirements.txt <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">to install the previous list of packages, possibly on an other machine</span>
</pre>
</div>
<p>
If you want to have several installed Python environments, you may
want to use <a href="https://docs.pipenv.org/">Pipenv</a>. I doubt it allows to track correctly FORTRAN or C
dynamic libraries that are wrapped by Python though.
</p>
</div>
</div>
<div id="outline-container-org849fdbb" class="outline-3">
<h3 id="org849fdbb">Installing a new package or a specific version</h3>
<div class="outline-text-3" id="text-org849fdbb">
<p>
The Jupyter environment we deployed on our servers for the MOOC is
based on the version 4.5.4 of Miniconda and Python 3.6. In this
environment you should simply use the <code>pip</code> command (remember on your
machine, you may have to use <code>pip3</code>).
</p>
<p>
If I query the current version of <code>statsmodels</code> in a shell command,
here is what I will get.
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip show statsmodels
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
Name: statsmodels
Version: 0.8.0
Summary: Statistical computations and models for Python
Home-page: http://www.statsmodels.org/
Author: Skipper Seabold, Josef Perktold
Author-email: pystatsmodels@googlegroups.com
License: BSD License
Location: /opt/conda/lib/python3.6/site-packages
Requires: scipy, patsy, pandas
</pre>
<p>
I can then easily upgrade <code>statsmodels</code>:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip install --upgrade statsmodels
</pre>
</div>
<p>
Then the new version should then be:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip show statsmodels
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
Name: statsmodels
Version: 0.9.0
Summary: Statistical computations and models for Python
Home-page: http://www.statsmodels.org/
Author: Skipper Seabold, Josef Perktold
Author-email: pystatsmodels@googlegroups.com
License: BSD License
Location: /opt/conda/lib/python3.6/site-packages
Requires: scipy, patsy, pandas
</pre>
<p>
It is even possible to install a specific (possibly much older) version, e.g.,:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">pip install <span style="font-weight: bold; font-style: italic;">statsmodels</span>==0.6.1
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-org4f45b1e" class="outline-2">
<h2 id="org4f45b1e">Getting information about R libraries</h2>
<div class="outline-text-2" id="text-org4f45b1e">
</div>
<div id="outline-container-orgc583ee9" class="outline-3">
<h3 id="orgc583ee9">Getting the list imported modules and their version</h3>
<div class="outline-text-3" id="text-orgc583ee9">
<p>
The best way seems to be to rely on the <code>devtools</code> package (if this
package is not installed, you should install it first by running in <code>R</code>
the command <code>install.packages("devtools")</code>).
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R">sessionInfo()
devtools::session_info()
</pre>
</div>
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="example">
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.1
Session info ------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
system x86_64, linux-gnu
ui X11
language (EN)
collate fr_FR.UTF-8
tz Europe/Paris
date 2018-08-01
Packages ----------------------------------------------------------------------
package * version date source
base * 3.5.1 2018-07-02 local
compiler 3.5.1 2018-07-02 local
datasets * 3.5.1 2018-07-02 local
devtools 1.13.6 2018-06-27 CRAN (R 3.5.1)
digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
graphics * 3.5.1 2018-07-02 local
grDevices * 3.5.1 2018-07-02 local
memoise 1.1.0 2017-04-21 CRAN (R 3.5.1)
methods * 3.5.1 2018-07-02 local
stats * 3.5.1 2018-07-02 local
utils * 3.5.1 2018-07-02 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
</pre>
<p>
Some actually advocate that <a href="https://github.com/ropensci/rrrpkg">writing a reproducible research compendium
is best done by writing an R package</a>. Those of you willing to have a
clean R dependency management should thus have a look at <a href="https://rstudio.github.io/packrat/">Packrat</a>.
</p>
</div>
</div>
<div id="outline-container-orgdffc6a5" class="outline-3">
<h3 id="orgdffc6a5">Getting the list of installed packages and their version</h3>
<div class="outline-text-3" id="text-orgdffc6a5">
<p>
Finally, it is good to know that there is a built-in R command
(<code>installed.packages</code>) allowing to retrieve and list the details of all
packages installed.
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R">head(installed.packages())
</pre>
</div>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
<col class="org-right" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Package</th>
<th scope="col" class="org-left">LibPath</th>
<th scope="col" class="org-right">Version</th>
<th scope="col" class="org-left">Priority</th>
<th scope="col" class="org-left">Depends</th>
<th scope="col" class="org-left">Imports</th>
<th scope="col" class="org-left">LinkingTo</th>
<th scope="col" class="org-left">Suggests</th>
<th scope="col" class="org-left">Enhances</th>
<th scope="col" class="org-left">License</th>
<th scope="col" class="org-left">License<sub>is</sub><sub>FOSS</sub></th>
<th scope="col" class="org-left">License<sub>restricts</sub><sub>use</sub></th>
<th scope="col" class="org-left">OS<sub>type</sub></th>
<th scope="col" class="org-left">MD5sum</th>
<th scope="col" class="org-left">NeedsCompilation</th>
<th scope="col" class="org-left">Built</th>
<th scope="col" class="org-right">&#xa0;</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">BH</td>
<td class="org-left">/home/alegrand/R/x86<sub>64</sub>-pc-linux-gnu-library/3.5</td>
<td class="org-right">1.66.0-1</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">BSL-1.0</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">no</td>
<td class="org-left">3.5.1</td>
<td class="org-right">&#xa0;</td>
</tr>
<tr>
<td class="org-left">Formula</td>
<td class="org-left">/home/alegrand/R/x86<sub>64</sub>-pc-linux-gnu-library/3.5</td>
<td class="org-right">1.2-3</td>
<td class="org-left">nil</td>
<td class="org-left">R (&gt;= 2.0.0), stats</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">GPL-2</td>
<td class="org-left">GPL-3</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">no</td>
<td class="org-right">3.5.1</td>
</tr>
<tr>
<td class="org-left">Hmisc</td>
<td class="org-left">/home/alegrand/R/x86<sub>64</sub>-pc-linux-gnu-library/3.5</td>
<td class="org-right">4.1-1</td>
<td class="org-left">nil</td>
<td class="org-left">lattice, survival (&gt;= 2.40-1), Formula, ggplot2 (&gt;= 2.2)</td>
<td class="org-left">methods, latticeExtra, cluster, rpart, nnet, acepack, foreign,</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-right">&#xa0;</td>
</tr>
<tr>
<td class="org-left">gtable, grid, gridExtra, data.table, htmlTable (&gt;= 1.11.0),</td>
<td class="org-left">&#xa0;</td>
<td class="org-right">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-right">&#xa0;</td>
</tr>
<tr>
<td class="org-left">viridis, htmltools, base64enc</td>
<td class="org-left">nil</td>
<td class="org-right">chron, rms, mice, tables, knitr, ff, ffbase, plotly (&gt;=</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-right">&#xa0;</td>
</tr>
<tr>
<td class="org-left">4.5.6)</td>
<td class="org-left">nil</td>
<td class="org-right">GPL (&gt;= 2)</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">yes</td>
<td class="org-left">3.5.1</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-left">&#xa0;</td>
<td class="org-right">&#xa0;</td>
</tr>
<tr>
<td class="org-left">Matrix</td>
<td class="org-left">/home/alegrand/R/x86<sub>64</sub>-pc-linux-gnu-library/3.5</td>
<td class="org-right">1.2-14</td>
<td class="org-left">recommended</td>
<td class="org-left">R (&gt;= 3.2.0)</td>
<td class="org-left">methods, graphics, grid, stats, utils, lattice</td>
<td class="org-left">nil</td>
<td class="org-left">expm, MASS</td>
<td class="org-left">MatrixModels, graph, SparseM, sfsmisc</td>
<td class="org-left">GPL (&gt;= 2)</td>
<td class="org-left">file LICENCE</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">yes</td>
<td class="org-right">3.5.1</td>
</tr>
<tr>
<td class="org-left">StanHeaders</td>
<td class="org-left">/home/alegrand/R/x86<sub>64</sub>-pc-linux-gnu-library/3.5</td>
<td class="org-right">2.17.2</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">RcppEigen, BH</td>
<td class="org-left">nil</td>
<td class="org-left">BSD<sub>3</sub><sub>clause</sub> + file LICENSE</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">yes</td>
<td class="org-left">3.5.1</td>
<td class="org-right">&#xa0;</td>
</tr>
<tr>
<td class="org-left">acepack</td>
<td class="org-left">/home/alegrand/R/x86<sub>64</sub>-pc-linux-gnu-library/3.5</td>
<td class="org-right">1.4.1</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">testthat</td>
<td class="org-left">nil</td>
<td class="org-left">MIT + file LICENSE</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">nil</td>
<td class="org-left">yes</td>
<td class="org-left">3.5.1</td>
<td class="org-right">&#xa0;</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-orgb52d0ce" class="outline-3">
<h3 id="orgb52d0ce">Installing a new package or a specific version</h3>
<div class="outline-text-3" id="text-orgb52d0ce">
<p>
This section is mostly a cut and paste from the <a href="https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages">recent post by Ian
Pylvainen</a> on this topic. It comprises a very clear explanation of how
to proceed.
</p>
</div>
<div id="outline-container-orgaf558a0" class="outline-4">
<h4 id="orgaf558a0">Installing a pre-compiled version</h4>
<div class="outline-text-4" id="text-orgaf558a0">
<p>
If you're on a Debian or a Ubuntu system, it may be difficult to
access a specific version without breaking your system. So unless you
are moving to the latest version available in your Linux distribution,
<b>we strongly recommend you to build from source</b>. In this case, you'll
need to make sure you have the necessary toolchain to build packages
from source (e.g., gcc, FORTRAN, etc.). On Windows, this may require
you to install <a href="https://cran.r-project.org/bin/windows/Rtools/">Rtools</a>.
</p>
<p>
If you're on Windows or OS X and looking for a package for an <b>older
version of R</b> (R 2.1 or below), you can check the <a href="https://cran-archive.r-project.org/bin/">CRAN binary
archive</a>. Once you have the URL, you can install it using a command
similar to the example below:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R">packageurl <span style="font-weight: bold; text-decoration: underline;">&lt;-</span> <span style="font-style: italic;">"https://cran-archive.r-project.org/bin/windows/contrib/2.13/BBmisc_1.0-58.zip"</span>
install.packages(packageurl, repos=<span style="font-weight: bold; text-decoration: underline;">NULL</span>, type=<span style="font-style: italic;">"binary"</span>)
</pre>
</div>
</div>
</div>
<div id="outline-container-org7d8a9f0" class="outline-4">
<h4 id="org7d8a9f0">Using devtools</h4>
<div class="outline-text-4" id="text-org7d8a9f0">
<p>
The simplest method to install the version you need is to use the
<code>install_version()</code> function of the <code>devtools</code> package (obviously, you
need to install <code>devtools</code> first, which can be done by running in <code>R</code> the
command <code>install.packages("devtools")</code>). For instance:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R"><span style="font-weight: bold; text-decoration: underline;">require</span>(devtools)
install_version(<span style="font-style: italic;">"ggplot2"</span>, version = <span style="font-style: italic;">"0.9.1"</span>, repos = <span style="font-style: italic;">"http://cran.us.r-project.org"</span>)
</pre>
</div>
</div>
</div>
<div id="outline-container-org4509fba" class="outline-4">
<h4 id="org4509fba">Installing from source code</h4>
<div class="outline-text-4" id="text-org4509fba">
<p>
Alternatively, you may want to install an older package from source If
devtools fails or if you do not want to depend on it, you can install
it from source via <code>install.packages()</code> directed using the right
URL. This URL can be obtained by browsing the <a href="https://cran.r-project.org/src/contrib/Archive">CRAN Package Archive</a>.
</p>
<p>
Once you have the URL, you can install it using a command similar to
the example below:
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-R">packageurl <span style="font-weight: bold; text-decoration: underline;">&lt;-</span> <span style="font-style: italic;">"http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.9.1.tar.gz"</span>
install.packages(packageurl, repos=<span style="font-weight: bold; text-decoration: underline;">NULL</span>, type=<span style="font-style: italic;">"source"</span>)
</pre>
</div>
<p>
If you know the URL, you can also install from source via the command
line outside of R. For instance (in bash):
</p>
<div class="org-src-container">
<pre style="padding-left: 30px; background-color: #f6f8fa;" class="src src-shell">wget http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.9.1.tar.gz
R CMD INSTALL ggplot2_0.9.1.tar.gz
</pre>
</div>
</div>
</div>
<div id="outline-container-org9d64d25" class="outline-4">
<h4 id="org9d64d25">Potential issues</h4>
<div class="outline-text-4" id="text-org9d64d25">
<p>
There are a few potential issues that may arise with installing older
versions of packages:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">You may be losing functionality or bug fixes that are only present
in the newer versions of the packages.</li>
<li style="margin-bottom:0;">The older package version needed may not be compatible with the
version of R you have installed. In this case, you will either need
to downgrade R to a compatible version or update your R code to work
with a newer version of the package.</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div id="content">
<h1 class="title">Additional references</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul style="margin:0 0;">
<li style="margin-bottom:0;"><a href="#org3b8ed57">"Thoughts" on language/software stability</a></li>
<li style="margin-bottom:0;"><a href="#org1d2d532">Controlling your software environment</a></li>
<li style="margin-bottom:0;"><a href="#org50da419">Preservation/Archiving</a></li>
<li style="margin-bottom:0;"><a href="#org5d2f9e5">Workflows</a></li>
<li style="margin-bottom:0;"><a href="#orgad41259">Numerical and statistical issues</a></li>
<li style="margin-bottom:0;"><a href="#org7321a51">Publication practices</a></li>
<li style="margin-bottom:0;"><a href="#orge4adad6">Experimentation</a></li>
</ul>
</div>
</div>
<div id="outline-container-org3b8ed57" class="outline-2">
<h2 id="org3b8ed57">"Thoughts" on language/software stability</h2>
<div class="outline-text-2" id="text-org3b8ed57">
<p>
As we explained, the programming language used in an analysis has a
clear influence on the reproducibility of your analysis. It is not a
characteristic of the language itself but rather a consequence of the
development philosophy of the underlying community. For example C is a
very stable language with a <a href="https://en.wikipedia.org/wiki/C_(programming_language)#ANSI_C_and_ISO_C">very clear specification designed by a
committee</a> (even though some compilers may not respect this norm).
</p>
<p>
On the other end of the spectrum, <a href="https://en.wikipedia.org/wiki/Python_(programming_language)">Python</a> had a much more organic
development based on a readability philosophy and valuing continuous
improvement over backwards-compatibility. Furthermore, Python is
commonly used as a wrapping language (e.g., to easily use C or FORTRAN
libraries) and has its own packaging system. All these design choices
tend to make reproducibility often a bit painful with Python, even
though the community is slowly taking this into account. The transition from Python 2 to the not fully backwards compatible Python 3 has been a particularly painful process, not least because the two languages are so similar that is it not always easy to figure out if a given script or module is written in Python 2 or Python 3. It isn't even rare to see Python scripts that work under both Python 2 and Python 3, but produce different results due to the change in the behavior of integer division.
</p>
<p>
<a href="https://en.wikipedia.org/wiki/R_(programming_language)">R</a>, in comparison is much closer (in terms of developer community) to
languages like <a href="https://en.wikipedia.org/wiki/SAS_(software)">SAS</a>, which is heavily used in the pharmaceutical
industry where statistical procedures need to be standardized and rock
solid/stable. R is obviously not immune to evolutions that break old
versions and hinder reproducibility/backward compatibility. Here is a
relatively recent <a href="http://members.cbio.mines-paristech.fr/~thocking/HOCKING-reproducible-research-with-R.html">true story about this</a> and some colleagues who worked
on the <a href="https://www.fun-mooc.fr/courses/UPSUD/42001S06/session06/about">statistics introductory course with R on FUN</a> reported us
several issues with a few functions (<code>plotmeans</code> from <code>gplots</code>,
<code>survfit</code> from <code>survival</code>, or <code>hclust</code>) whose default parameters had
changed over the years. It is thus probably good practice to give
explicit values for all parameters (which can be cumbersome) instead
of relying on default values, and to restrict your dependencies as much
as possible.
</p>
<p>
This being said, the R development community is generally quite
careful about stability. We (the authors of this MOOC) believe that open
source (which allows to inspect how computation is done and to
identify both mistakes and sources of non-reproducibility) is more
important than the rock solid stability of SAS, which is proprietary
software. Yet, if you really need to stay with SAS (similar solutions
probably exist for other languages as well), you should know that SAS
can be used within Jupyter using either the <a href="https://sassoftware.github.io/sas_kernel/">Python SASKernel</a> or the
<a href="https://sassoftware.github.io/saspy/">Python SASPy</a> package (step by step explanations about this are given
<a href="https://app-learninglab.inria.fr/gitlab/85bc36e0a8096c618fbd5993d1cca191/mooc-rr/blob/master/documents/tuto_jupyter_windows/tuto_jupyter_windows.md">here</a>). Using such literate programming approach allied with systematic
version and environment control will always help.
</p>
</div>
</div>
<div id="outline-container-org1d2d532" class="outline-2">
<h2 id="org1d2d532">Controlling your software environment</h2>
<div class="outline-text-2" id="text-org1d2d532">
<p>
As we mentioned in the video sequences, there are several solutions to
control your environment:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;">The easy (preserve the mess) ones: <a href="http://www.pgbovine.net/cde.html">CDE</a> or <a href="https://vida-nyu.github.io/reprozip/">ReproZip</a></li>
<li style="margin-bottom:0;">The more demanding (encourage cleanliness) where you start with a
clean environment and install only what's strictly necessary (and document it):
<ul class="org-ul">
<li style="margin-bottom:0;">The very well known <a href="https://www.docker.io/">Docker</a></li>
<li style="margin-bottom:0;"><a href="https://singularity.lbl.gov/">Singularity</a> or <a href="https://spack.io/">Spack</a>, which are more targeted toward the specific
needs of high performance computing users</li>
<li style="margin-bottom:0;"><a href="https://www.gnu.org/software/guix/">Guix</a>, <a href="https://nixos.org/">Nix</a> that are very clean (perfect?) solutions to this
dependency hell and which we recommend</li>
</ul></li>
</ul>
<p>
It may be hard to understand the difference between these different
approaches and decide which one is better in your context.
</p>
<p>
Here is a webinar where some of these tools are demoed in a
reproducible research context: <a href="https://github.com/alegrand/RR_webinars/blob/master/2_controling_your_environment/index.org">Controling your environment (by Michael
Mercier and Cristian Ruiz)</a>
</p>
<p>
You may also want to have a look at <a href="http://falsifiable.us/">the Popper conventions</a> (<a href="https://github.com/alegrand/RR_webinars/blob/master/11_popper/index.org">webinar by
Ivo Gimenez through google hangout</a>) or at the <a href="https://github.com/alegrand/RR_webinars/blob/master/7_publications/index.org">presentation of Konrad
Hinsen on Active Papers</a> (<a href="http://www.activepapers.org/">http://www.activepapers.org/</a>).
</p>
</div>
</div>
<div id="outline-container-org50da419" class="outline-2">
<h2 id="org50da419">Preservation/Archiving</h2>
<div class="outline-text-2" id="text-org50da419">
<p>
Ensuring software is properly archived, i.e, is safely stored so that
it can be accessed in a perennial way, can be quite tricky. If you
have never seen <a href="https://github.com/alegrand/RR_webinars/blob/master/5_archiving_software_and_data/index.org">Roberto Di Cosmo presenting the Software Heritage
project</a>, this is a must see. <a href="https://www.softwareheritage.org/">https://www.softwareheritage.org/</a>
</p>
<p>
For regular data, we highly recommend using <a href="https://www.zenodo.org/">https://www.zenodo.org/</a>
whenever the data is not sensitive.
</p>
</div>
</div>
<div id="outline-container-org5d2f9e5" class="outline-2">
<h2 id="org5d2f9e5">Workflows</h2>
<div class="outline-text-2" id="text-org5d2f9e5">
<p>
In the video sequences, we mentioned workflow managers (original application domain in parenthesis):
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://galaxyproject.org/">Galaxy</a> (genomics), <a href="https://kepler-project.org/">Kepler</a> (ecology), <a href="https://taverna.apache.org/">Taverna</a> (bio-informatics), <a href="https://pegasus.isi.edu/">Pegasus</a>
(astronomy), <a href="http://cknowledge.org/">Collective Knowledge</a> (compiling optimization) ,
<a href="https://www.vistrails.org">VisTrails</a> (image processing)</li>
<li style="margin-bottom:0;">Light-weight: <a href="http://dask.pydata.org/">dask</a> (python), <a href="https://ropensci.github.io/drake/">drake</a> (R), <a href="http://swift-lang.org/">swift</a> (molecular biology),
<a href="https://snakemake.readthedocs.io/">snakemake</a> (like <code>make</code> but more expressive and in <code>python</code>) &#x2026;</li>
<li style="margin-bottom:0;">Hybrids: <a href="https://vatlab.github.io/sos-docs/">SOS-notebook</a>, &#x2026;</li>
</ul>
<p>
You may want to have a look at this webinar: <a href="https://github.com/alegrand/RR_webinars/blob/master/6_reproducibility_bioinformatics/index.org">Reproducible Science in
Bio-informatics: Current Status, Solutions and Research Opportunities
(by Sarah Cohen Boulakia, Yvan Le Bras and Jérôme Chopard).</a>
</p>
</div>
</div>
<div id="outline-container-orgad41259" class="outline-2">
<h2 id="orgad41259">Numerical and statistical issues</h2>
<div class="outline-text-2" id="text-orgad41259">
<p>
We have mentioned these topics in our MOOC but we could by no way
cover them properly. We only suggest here a few interesting talks
about this.
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://github.com/alegrand/RR_webinars/blob/master/10_statistics_and_replication_in_HCI/index.org">In this talk, Pierre Dragicevic provides a nice illustration of the
consequences of statistical uncertainty and of how some concepts
(e.G. p-values) are commonly badly understood.</a></li>
<li style="margin-bottom:0;"><a href="https://github.com/alegrand/RR_webinars/blob/master/3_numerical_reproducibility/index.org">Nathalie Revol, Philippe Langlois and Stef Graillat present the main
challenges encountered when trying to achieve numerical
reproducibility and present recent research work on this topic.</a></li>
</ul>
</div>
</div>
<div id="outline-container-org7321a51" class="outline-2">
<h2 id="org7321a51">Publication practices</h2>
<div class="outline-text-2" id="text-org7321a51">
<p>
You may want to have a look at the following two webinars:
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://github.com/alegrand/RR_webinars/blob/master/8_artifact_evaluation/index.org">Enabling open and reproducible research at computer systems’
conferences (by Grigori Fursin)</a>. In particular, this talk discusses
<i>artifact evaluation</i> that is becoming more and more popular.</li>
<li style="margin-bottom:0;"><a href="https://github.com/alegrand/RR_webinars/blob/master/7_publications/index.org">Publication Modes Favoring Reproducible Research (by Konrad Hinsen
and Nicolas Rougier)</a>. In this talk, the motivation for the <a href="http://rescience.github.io/">ReScience
journal</a> initiative are presented.</li>
<li style="margin-bottom:0;"><a href="https://www.youtube.com/watch?v=HuJ2G8rXHMs">Simine Vazire - When Should We be Skeptical of Scientific Claims?</a>,
which is discussing publication practices in social sciences and in
particular HARKing (Hypothesizing After the Results are Known),
p-hacking, etc.</li>
</ul>
</div>
</div>
<div id="outline-container-orge4adad6" class="outline-2">
<h2 id="orge4adad6">Experimentation</h2>
<div class="outline-text-2" id="text-orge4adad6">
<p>
Experimentation was not covered in this MOOC, although it is an
essential part of science. The main reason is that practices and
constraints can vary so wildly from one domain to another that it could
not be properly covered in a first edition. We would be happy to
gather references you consider as interesting in your domain so do not
hesitate to provide us with such references by using the forum and we
will update this page.
</p>
<ul class="org-ul">
<li style="margin-bottom:0;"><a href="https://github.com/alegrand/RR_webinars/blob/master/9_experimental_testbeds/index.org">A recent talk by Lucas Nussbaum on Experimental Testbeds in Computer
Science</a>.</li>
</ul>
</div>
</div>
</div>
Hello world ! It works
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment