Commit c2717bc8 authored by Pierre LACOSTE's avatar Pierre LACOSTE

Exo module 4

parent c3d66573
This source diff could not be displayed because it is too large. You can view the blob instead.
---
title: "Chalenger"
author: "Pierre Lacoste"
date: "14/12/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Information technique pour lesquelles l'analyse est executé:
Librairie `ggplot2`
```{r}
library(ggplot2)
sessionInfo()
```
## Chargement et inspection des données
```{r}
data = read.csv("https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/raw/master/data/shuttle.csv?inline=false")
data
```
Trion les données même si nous savons que ce n'est pas une bonne idée.
On commence par inspecter les données visuellement.
```{r}
plot(data=data, Malfunction/Count ~ Temperature, ylim=c(0,1))
```
## Regression logistique
On assume que le seul parmètre qui influence le risque de défaillance est la température.
On peut donc faire une regression logistique pour estimer l'effet de la température sur le risque de défaillance.
```{r}
logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count, family=binomial(link='logit'))
summary(logistic_reg)
```
The maximum likelyhood estimator of the intercept and of Temperature are thus $\alpha = 5.0849$ and $\beta = −0.1156$ and their standard errors are $s_\alpha = 3.052$ and $s_\beta = 0.04702$. The Residual deviance corresponds to the
Goodness of fit G2 = 18.086 with 21 degrees of freedom. I have therefore managed to replicate the
results of the Dalal et al. article.
## Prediction de la probabilité de defaillance:
```{r}
# shuttle=shuttle[shuttle$r!=0,]
tempv = seq(from=30, to=90, by = .5)
rmv <- predict(logistic_reg,list(Temperature=tempv),type="response")
plot(tempv,rmv,type="l",ylim=c(0,1))
points(data=data, Malfunction/Count ~ Temperature)
```
Figure similaire à la figure 4 de l'article.
## Intervale de confiance de la prédiction
```{r}
ggplot(data, aes(y=Malfunction/Count, x=Temperature)) + geom_point(alpha=.2, size = 2, color="blue") +
geom_smooth(method = "glm", method.args = list(family = "binomial"), fullrange=T) +
xlim(30,90) + ylim(0,1) + theme_bw()
```
Message d'erreur de ggplot.
On donne les donnée brute à ggplot
```{r}
data_flat=data.frame()
for(i in 1:nrow(data)) {
temperature = data[i,"Temperature"];
malfunction = data[i,"Malfunction"];
d = data.frame(Temperature=temperature,Malfunction=rep(0,times = data[i,"Count"]))
if(malfunction>0) {
d[1:malfunction, "Malfunction"]=1;
}
data_flat=rbind(data_flat,d)
}
dim(data_flat)
```
```{r}
str(data_flat)
```
```{r}
logistic_reg_flat = glm(data=data_flat, Malfunction ~ Temperature, family=binomial(link='logit'))
summary(logistic_reg)
```
```{r}
ggplot(data=data_flat, aes(y=Malfunction, x=Temperature)) +
geom_smooth(method = "glm", method.args = list(family = "binomial"), fullrange=T) +
geom_point(data=data, aes(y=Malfunction/Count, x=Temperature),alpha=.2, size = 2, color="blue") +
geom_point(alpha=.5, size = .5) +
xlim(30,90) + ylim(0,1) + theme_bw()
```
```{r}
pred = predict(logistic_reg_flat,list(Temperature=30),type="response",se.fit = T)
pred
```
```{r}
pred_link = predict(logistic_reg_flat,list(Temperature=30),type="link",se.fit = T)
pred_link
```
```{r}
logistic_reg$family$linkinv(pred_link$fit)
```
```{r}
critval = 1.96
logistic_reg$family$linkinv(c(pred_link$fit-critval*pred_link$se.fit,
pred_link$fit+critval*pred_link$se.fit))
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment