42229b8d9233ebe04805dfd660381627 · HOUNGBO Mahugnon Ezékiel · 42229b8d9233ebe04805dfd660381627 · 42229b8d9233ebe04805dfd660381627 · 42229b8d9233ebe04805dfd660381627 · 42229b8d9233ebe04805dfd660381627
17 changed files
--- a/module1/exo2/firsttest-markdown.md
+++ b/module1/exo2/firsttest-markdown.md
+# Partie 1
+## Sous-partie 1 : texte
+Une phrase sans rien
+*Une phrase en italique*
+__Une phrase en gras__
+Un lien vers [fun-mooc.fr](https://www.fun-mooc.fr/)
+Une ligne de ```code```
+## Sous-partie 2 : listes
+__Liste à puce__
+- item
+     - sous-item
+     - sous-item
+- item
+- item
+__Liste numérotée__
+1. item
+2. item
+3. item
+## Sous-partie 3 : Code
+```
+# Extrait de code
\ No newline at end of file
--- a/module2/exo1/toy_document_fr.Rmd
+++ b/module2/exo1/toy_document_fr.Rmd
 ---
-title: "Votre titre"
+title: "À propos du calcul de pi"
-author: "Votre nom"
+author: _Arnaud Legrand_
-date: "La date du jour"
+date: _25 juin 2018_
 output: html_document
 ---
+# En demandant à la lib maths
-```{r setup, include=FALSE}
+Mon ordianateur m'indique que $\pi$ vaut _approximativement_
-knitr::opts_chunk$set(echo = TRUE)
+```{r}
+pi
 ```
+# En utilisant la méthode des aiguilles de Buffon
-## Quelques explications
+Mais calculé avec la **méthode** des [aiguilles de Buffon](https://fr.wikipedia.org/wiki/Aiguille_de_Buffon), on obtiendrait comme **approximation** :
+```{r}
-Ceci est un document R markdown que vous pouvez aisément exporter au format HTML, PDF, et MS Word. Pour plus de détails sur R Markdown consultez <http://rmarkdown.rstudio.com>.
+set.seed(42)
+N = 100000
-Lorsque vous cliquerez sur le bouton **Knit** ce document sera compilé afin de ré-exécuter le code R et d'inclure les résultats dans un document final. Comme nous vous l'avons montré dans la vidéo, on inclue du code R de la façon suivante:
+x = runif(N)
+theta = pi/2*runif(N)
-```{r cars}
+2/(mean(x+sin(theta)>1))
-summary(cars)
 ```
+# Avec un argument “fréquentiel” de surface
-Et on peut aussi aisément inclure des figures. Par exemple:
+Sinon, une méthode plus simple à comprendre et ne faisant pas intervenir d’appel à la fonction sinus se base sur le fait que si $X \sim U(0,1)$ et $Y \sim U(0,1)$ alors $P[X^2+Y^2≤1]=\pi/4$ (voir [méthode de Monte Carlo sur Wikipedia](https://fr.wikipedia.org/wiki/M%C3%A9thode_de_Monte-Carlo#D%C3%A9termination_de_la_valeur_de_%CF%80)). Le code suivant illustre ce fait:
+```{r}
-```{r pressure, echo=FALSE}
+set.seed(42)
-plot(pressure)
+N = 1000
+df = data.frame(X = runif(N), Y = runif(N))
+df$Accept = (df$X**2 + df$Y**2 <=1)
+library(ggplot2)
+ggplot(df, aes(x=X,y=Y,color=Accept)) + geom_point(alpha=.2) + coord_fixed() + theme_bw()
 ```
+Il est alors aisé d’obtenir une approximation (pas terrible) de $\pi$ en comptant combien de fois, en moyenne, $X^2+Y^2$ est inférieur à 1:
-Vous remarquerez le paramètre `echo = FALSE` qui indique que le code ne doit pas apparaître dans la version finale du document. Nous vous recommandons dans le cadre de ce MOOC de ne pas utiliser ce paramètre car l'objectif est que vos analyses de données soient parfaitement transparentes pour être reproductibles. 
+```{r}
+4*mean(df$Accept)
-Comme les résultats ne sont pas stockés dans les fichiers Rmd, pour faciliter la relecture de vos analyses par d'autres personnes, vous aurez donc intérêt à générer un HTML ou un PDF et à le commiter.
+```
\ No newline at end of file
-Maintenant, à vous de jouer! Vous pouvez effacer toutes ces informations et les remplacer par votre document computationnel.
--- a/module2/exo2/exercice_fr.Rmd
+++ b/module2/exo2/exercice_fr.Rmd
@@ -10,24 +10,13 @@ output: html_document
 knitr::opts_chunk$set(echo = TRUE)
 ```
-## Quelques explications
+```{r}
+data=c(14.0, 7.6, 11.2, 12.8, 12.5, 9.9, 14.9, 9.4, 16.9, 10.2, 14.9, 18.1, 7.3, 9.8, 10.9,12.2, 9.9, 2.9, 2.8, 15.4, 15.7, 9.7, 13.1, 13.2, 12.3, 11.7, 16.0, 12.4, 17.9, 12.2, 16.2, 18.7, 8.9, 11.9, 12.1, 14.6, 12.1, 4.7, 3.9, 16.9, 16.8, 11.3, 14.4, 15.7, 14.0, 13.6, 18.0, 13.6, 19.9, 13.7, 17.0, 20.5, 9.9, 12.5, 13.2, 16.1, 13.5, 6.3, 6.4, 17.6, 19.1, 12.8, 15.5, 16.3, 15.2, 14.6, 19.1, 14.4, 21.4, 15.1, 19.6, 21.7, 11.3, 15.0, 14.3, 16.8, 14.0, 6.8, 8.2, 19.9, 20.4, 14.6, 16.4, 18.7, 16.8, 15.8, 20.4, 15.8, 22.4, 16.2, 20.3, 23.4, 12.1, 15.5, 15.4, 18.4, 15.7, 10.2, 8.9, 21.0)
-Ceci est un document R markdown que vous pouvez aisément exporter au format HTML, PDF, et MS Word. Pour plus de détails sur R Markdown consultez <http://rmarkdown.rstudio.com>.
-Lorsque vous cliquerez sur le bouton **Knit** ce document sera compilé afin de ré-exécuter le code R et d'inclure les résultats dans un document final. Comme nous vous l'avons montré dans la vidéo, on inclue du code R de la façon suivante:
-```{r cars}
-summary(cars)
 ```
+```{r}
-Et on peut aussi aisément inclure des figures. Par exemple:
+data.moyenne= mean(data); data.moyenne
+data.ecarttype= sd(data);data.ecarttype
-```{r pressure, echo=FALSE}
+data.min=min(data);data.min
-plot(pressure)
+data.médiane=median(data); data.médiane
+data.max=max(data); data.max
 ```
-Vous remarquerez le paramètre `echo = FALSE` qui indique que le code ne doit pas apparaître dans la version finale du document. Nous vous recommandons dans le cadre de ce MOOC de ne pas utiliser ce paramètre car l'objectif est que vos analyses de données soient parfaitement transparentes pour être reproductibles. 
-Comme les résultats ne sont pas stockés dans les fichiers Rmd, pour faciliter la relecture de vos analyses par d'autres personnes, vous aurez donc intérêt à générer un HTML ou un PDF et à le commiter.
-Maintenant, à vous de jouer! Vous pouvez effacer toutes ces informations et les remplacer par votre document computationnel.
--- a/module2/exo3/exercice_fr.Rmd
+++ b/module2/exo3/exercice_fr.Rmd
---
+```{r}
-title: "Votre titre"
+data=c(14.0, 7.6, 11.2, 12.8, 12.5, 9.9, 14.9, 9.4, 16.9, 10.2, 14.9, 18.1, 7.3, 9.8, 10.9,12.2, 9.9, 2.9, 2.8, 15.4, 15.7, 9.7, 13.1, 13.2, 12.3, 11.7, 16.0, 12.4, 17.9, 12.2, 16.2, 18.7, 8.9, 11.9, 12.1, 14.6, 12.1, 4.7, 3.9, 16.9, 16.8, 11.3, 14.4, 15.7, 14.0, 13.6, 18.0, 13.6, 19.9, 13.7, 17.0, 20.5, 9.9, 12.5, 13.2, 16.1, 13.5, 6.3, 6.4, 17.6, 19.1, 12.8, 15.5, 16.3, 15.2, 14.6, 19.1, 14.4, 21.4, 15.1, 19.6, 21.7, 11.3, 15.0, 14.3, 16.8, 14.0, 6.8, 8.2, 19.9, 20.4, 14.6, 16.4, 18.7, 16.8, 15.8, 20.4, 15.8, 22.4, 16.2, 20.3, 23.4, 12.1, 15.5, 15.4, 18.4, 15.7, 10.2, 8.9, 21.0)
-author: "Votre nom"
-date: "La date du jour"
-output: html_document
---
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(echo = TRUE)
 ```
+```{r}
-## Quelques explications
+plot(data, type="l", ann = FALSE,col="blue", panel.first = grid())
-Ceci est un document R markdown que vous pouvez aisément exporter au format HTML, PDF, et MS Word. Pour plus de détails sur R Markdown consultez <http://rmarkdown.rstudio.com>.
-Lorsque vous cliquerez sur le bouton **Knit** ce document sera compilé afin de ré-exécuter le code R et d'inclure les résultats dans un document final. Comme nous vous l'avons montré dans la vidéo, on inclue du code R de la façon suivante:
-```{r cars}
-summary(cars)
 ```
+```{r}
-Et on peut aussi aisément inclure des figures. Par exemple:
+hist(data, ann = FALSE,col="blue" )
+```
-```{r pressure, echo=FALSE}
\ No newline at end of file
-plot(pressure)
-```
-Vous remarquerez le paramètre `echo = FALSE` qui indique que le code ne doit pas apparaître dans la version finale du document. Nous vous recommandons dans le cadre de ce MOOC de ne pas utiliser ce paramètre car l'objectif est que vos analyses de données soient parfaitement transparentes pour être reproductibles. 
-Comme les résultats ne sont pas stockés dans les fichiers Rmd, pour faciliter la relecture de vos analyses par d'autres personnes, vous aurez donc intérêt à générer un HTML ou un PDF et à le commiter.
-Maintenant, à vous de jouer! Vous pouvez effacer toutes ces informations et les remplacer par votre document computationnel.
--- a/module2/exo4/DonneModule2.csv
+++ b/module2/exo4/DonneModule2.csv
+Notes
+14
+7.6
+11.2
+12.8
+12.5
+9.9
+14.9
+9.4
+16.9
+10.2
+14.9
+18.1
+7.3
+9.8
+10.9
+12.2
+9.9
+2.9
+2.8
+15.4
+15.7
+9.7
+13.1
+13.2
+12.3
+11.7
+16
+12.4
+17.9
+12.2
+16.2
+18.7
+8.9
+11.9
+12.1
+14.6
+12.1
+4.7
+3.9
+16.9
+16.8
+11.3
+14.4
+15.7
+14
+13.6
+18
+13.6
+19.9
+13.7
+17
+20.5
+9.9
+12.5
+13.2
+16.1
+13.5
+6.3
+6.4
+17.6
+19.1
+12.8
+15.5
+16.3
+15.2
+14.6
+19.1
+14.4
+21.4
+15.1
+19.6
+21.7
+11.3
+15
+14.3
+16.8
+14
+6.8
+8.2
+19.9
+20.4
+14.6
+16.4
+18.7
+16.8
+15.8
+20.4
+15.8
+22.4
+16.2
+20.3
+23.4
+12.1
+15.5
+15.4
+18.4
+15.7
+10.2
+8.9
+21
--- a/module2/exo4/exercice_fr.Rmd
+++ b/module2/exo4/exercice_fr.Rmd
 ---
-title: "Votre titre"
+title: "Recherche reproductible : principes méthodologique"
-author: "Votre nom"
+author: "Mahugnon Ezékiel HOUNGBO"
-date: "La date du jour"
+date: "17/02/2021"
 output: html_document
 ---
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
 ```
-## Quelques explications
+# Mon journal de bord
-Ceci est un document R markdown que vous pouvez aisément exporter au format HTML, PDF, et MS Word. Pour plus de détails sur R Markdown consultez <http://rmarkdown.rstudio.com>.
+# Module2
-Lorsque vous cliquerez sur le bouton **Knit** ce document sera compilé afin de ré-exécuter le code R et d'inclure les résultats dans un document final. Comme nous vous l'avons montré dans la vidéo, on inclue du code R de la façon suivante:
-```{r cars}
+## Importation des packages  
-summary(cars)
+  setwd("C:/Users/Denis CORNET/mooc-rr/module2")
-```
+  À utiliser lors du travail en local
-Et on peut aussi aisément inclure des figures. Par exemple:
-```{r pressure, echo=FALSE}
+```{r}
-plot(pressure)
+library(tidyverse)
 ```
-Vous remarquerez le paramètre `echo = FALSE` qui indique que le code ne doit pas apparaître dans la version finale du document. Nous vous recommandons dans le cadre de ce MOOC de ne pas utiliser ce paramètre car l'objectif est que vos analyses de données soient parfaitement transparentes pour être reproductibles. 
+## Importation de la base de donnée
-Comme les résultats ne sont pas stockés dans les fichiers Rmd, pour faciliter la relecture de vos analyses par d'autres personnes, vous aurez donc intérêt à générer un HTML ou un PDF et à le commiter.
+```{r}
+DonneModule2<- read.csv2("C:/Users/Denis CORNET/mooc-rr/module2/exo4/DonneModule2.csv")
+DonneModule2 <- as.data.frame(DonneModule2)
+head(DonneModule2)
+DonneModule2$Notes <- as.numeric(DonneModule2$Notes)
+head(DonneModule2)
+```
+## Statistique descriptive sur la base de donnée
+### Moyenne, écart-type, minimum, médiane et maximum 
+```{r}
+DonneModule2.moyenne= mean(DonneModule2$Notes); DonneModule2.moyenne
+DonneModule2.ecarttype= sd(DonneModule2$Notes);DonneModule2.ecarttype
+DonneModule2.min=min(DonneModule2$Notes);DonneModule2.min
+DonneModule2.médiane=median(DonneModule2$Notes); DonneModule2.médiane
+DonneModule2.max=max(DonneModule2$Notes); DonneModule2.max
+```
+### Grapiques de la statistique descriptive
+#### Plot
+```{r}
+plot(DonneModule2$Notes, type="l", ann = FALSE,col="blue", panel.first = grid())
+```
+#### Histogramme
+```{r}
+hist(DonneModule2$Notes, ann = FALSE,col="blue" )
+```
-Maintenant, à vous de jouer! Vous pouvez effacer toutes ces informations et les remplacer par votre document computationnel.
--- a/module2/exo4/exercice_fr.html
+++ b/module2/exo4/exercice_fr.html
--- a/module2/exo5/exo5_fr.html
+++ b/module2/exo5/exo5_fr.html
--- a/module3/exo1/analyse-syndrome-grippal.Rmd
+++ b/module3/exo1/analyse-syndrome-grippal.Rmd
@@ -18,6 +18,11 @@ header-includes:
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
 ```
+## Vérification de l'existance des données dans le fichier local
+```{r}
+data = read.csv("incidence-PAY-3.csv")
+```
+On constacte que les données n'existent pas dans le fichier local. Téléchargeons maintenanat les données depuis le site web du Réseau Sentinelles.
 ## Préparation des données
@@ -47,6 +52,15 @@ La première ligne du fichier CSV est un commentaire, que nous ignorons en préc
 data = read.csv(data_url, skip=1)
 ```
+### Depôt des données dans le fichier local
+```{r}
+write.csv(data,"incidence-PAY-3.csv")
+```
+### Lecture des données à partir du fichier local
+```{r}
+data = read.csv("incidence-PAY-3.csv")
+```
 Regardons ce que nous avons obtenu:
 ```{r}
 head(data)

--- a/module3/exo3/exercice_fr.html
+++ b/module3/exo3/exercice_fr.html
--- a/module3/exo3/exercice_fr.pdf
+++ b/module3/exo3/exercice_fr.pdf
--- a/module3/exo3/liglab2.csv
+++ b/module3/exo3/liglab2.csv
--- a/module3/exo3/stackoverflow.csv
+++ b/module3/exo3/stackoverflow.csv
--- a/module4/ChallengerRepro.Rmd
+++ b/module4/ChallengerRepro.Rmd
+---
+title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
+author: ""
+date: "26 février 2021"
+output: pdf_document
+---
+In this document we reperform some of the analysis provided in
+*Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069.
+On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.
+# Technical information on the computer on which the analysis is run
+We will be using the R language using the ggplot2 library.
+```{r}
+library(ggplot2)
+sessionInfo()
+```
+Here are the available libraries
+```{r}
+devtools::session_info()
+```
+# Loading and inspecting data
+Let's start by reading data:
+```{r}
+data_url <- "https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/raw/master/data/shuttle.csv?inline=false"
+data = read.csv(data_url, h=T)
+head(data)
+```
+We know from our previous experience on this data set that filtering data is a really bad idea. We will therefore process it as such.
+Let's visually inspect how temperature affects malfunction:
+```{r}
+plot(data=data, Malfunction/Count ~ Temperature, ylim=c(0,1))
+```
+# Logistic regression
+Let's assume O-rings independently fail with the same probability which solely depends on temperature. A logistic regression should allow us to estimate the influence of temperature.
+```{r}
+logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count,
+                   family=binomial(link='logit'))
+summary(logistic_reg)
+```
+The maximum likelyhood estimator of the intercept and of Temperature are thus $\hat{\alpha}=5.0849$ and $\hat{\beta}=-0.1156$ and their standard errors are $s_{\hat{\alpha}} = 3.052$ and $s_{\hat{\beta}} = 0.04702$. The Residual deviance corresponds to the Goodness of fit $G^2=18.086$ with 21 degrees of freedom. **I have therefore managed to replicate the results of the Dalal *et al.* article**.
+# Predicting failure probability
+The temperature when launching the shuttle was 31°F. Let's try to
+estimate the failure probability for such temperature using our model.:
+```{r}
+# shuttle=shuttle[shuttle$r!=0,]
+tempv = seq(from=30, to=90, by = .5)
+rmv <- predict(logistic_reg,list(Temperature=tempv),type="response")
+plot(tempv,rmv,type="l",ylim=c(0,1))
+points(data=data, Malfunction/Count ~ Temperature)
+```
+This figure is very similar to the Figure 4 of Dalal et al. **I have managed to replicate the Figure 4 of the Dalal *et al.* article.**
+# Confidence on the prediction
+Let's try to plot confidence intervals with ggplot2.
+```{r, fig.height=3.3}
+ggplot(data, aes(y=Malfunction/Count, x=Temperature)) + geom_point(alpha=.2, size = 2, color="blue") +
+  geom_smooth(method = "glm", method.args = list(family = "binomial"), fullrange=T) +
+  xlim(30,90) + ylim(0,1) + theme_bw()
+```
+Mmmh, I have a warning from ggplot2 indicating *"non-integer #successes in a binomial glm!"*. This seems fishy. Furthermore, this confidence region seems huge... It seems strange to me that the uncertainty grows so large for higher temperatures. And compared to my previous call to glm, I haven't indicated the weight which accounts for the fact that each ratio Malfunction/Count corresponds to Count observations (if someone knows how to do this...). There must be something wrong.
+So let's provide the "raw" data to ggplot2.
+```{r}
+data_flat=data.frame()
+for(i in 1:nrow(data)) {
+  temperature = data[i,"Temperature"];
+  malfunction = data[i,"Malfunction"];
+  d = data.frame(Temperature=temperature,Malfunction=rep(0,times = data[i,"Count"]))
+  if(malfunction>0) {
+      d[1:malfunction, "Malfunction"]=1;
+  }
+  data_flat=rbind(data_flat,d)
+}
+dim(data_flat)
+str(data_flat)
+```
+Let's check whether I obtain the same regression or not:
+```{r}
+logistic_reg_flat = glm(data=data_flat, Malfunction ~ Temperature, family=binomial(link='logit'))
+summary(logistic_reg)
+```
+Perfect. The estimates and the standard errors are the same although the Residual deviance is difference since the distance is now measured with respect to each 0/1 measurement and not to ratios. Let's use plot the regression for *data_flat* along with the ratios (*data*).
+```{r, fig.height=3.3}
+ggplot(data=data_flat, aes(y=Malfunction, x=Temperature)) +
+  geom_smooth(method = "glm", method.args = list(family = "binomial"), fullrange=T) +
+  geom_point(data=data, aes(y=Malfunction/Count, x=Temperature),alpha=.2, size = 2, color="blue") +
+  geom_point(alpha=.5, size = .5) +
+  xlim(30,90) + ylim(0,1) + theme_bw()
+```
+This confidence interval seems much more reasonable (in accordance with the data) than the previous one. Let's check whether it corresponds to the prediction obtained when calling directly predict. Obtaining the prediction can be done directly or through the link function.
+Here is the "direct" (response) version I used in my very first plot:
+```{r}
+pred = predict(logistic_reg_flat,list(Temperature=30),type="response",se.fit = T)
+pred
+```
+The estimated Failure probability for 30° is thus $0.834$. However, the $se.fit$ value seems pretty hard to use as I can obviously not simply add $\pm 2 se.fit$ to $fit$ to compute a confidence interval.
+Here is the "link" version:
+```{r}
+pred_link = predict(logistic_reg_flat,list(Temperature=30),type="link",se.fit = T)
+pred_link
+logistic_reg$family$linkinv(pred_link$fit)
+```
+I recover $0.834$ for the estimated Failure probability at 30°. But now, going through the *linkinv* function, we can use $se.fit$:
+```{r}
+critval = 1.96
+logistic_reg$family$linkinv(c(pred_link$fit-critval*pred_link$se.fit,
+                              pred_link$fit+critval*pred_link$se.fit))
+```
+The 95% confidence interval for our estimation is thus [0.163,0.992]. This is what ggplot2 just plotted me. This seems coherent.
+**I am now rather confident that I have managed to correctly compute and plot the uncertainty of my prediction.** Let's be honnest, it took me a while. My first attempts were plainly wrong (I didn't know how to do this so I trusted ggplot2, which I was misusing) and did not use the correct statistical method. I also feel confident now because this has been somehow validated by other colleagues but it will be interesting that you collect other kind of plots values that you obtained, that differ and that you would probably have kept if you didn't have a reference to compare to. Please provide us with as many versions as you can.
--- a/module4/ChallengerRepro.html
+++ b/module4/ChallengerRepro.html
--- a/module4/ChallengerRepro.pdf
+++ b/module4/ChallengerRepro.pdf
--- a/readme.md
+++ b/readme.md
 # MOOC Recherche Reproductible / Reproducibleesearch Mooc
 FR
@@ -6,4 +6,6 @@ Ce dépôt contient les fichiers utiles au MOOC Recherche Reproductible.
 EN
 This repository contains useful files for the Reproducible Research MOOC.
\ No newline at end of file
+ceci est un test.
\ No newline at end of file