title: 'Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure'
author: "Lisa Ftn, based on Arnaud Legrand's work"
date: "12/01/2022"
output:
html_document:
df_print: paged
---
In this document we reperform some of the analysis provided in
*Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available [here](http://www.jstor.org/stable/2290069).
On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.
# Technical information on the computer on which the analysis is run
We will be using the R language using the ggplot2 library.
```{r}
library(ggplot2)
sessionInfo()
```
Here are the available libraries
```{r}
devtools::session_info()
```
# Loading and inspecting data
Let's start by reading data:
```{r}
data = read.csv("https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/raw/master/data/shuttle.csv?inline=false",header=T)
data
```
Let's visually inspect how temperature affects malfunction:
Let's check whether this confidence interval corresponds to the prediction obtained when calling directly predict. Obtaining the prediction can be done directly or through the link function.
Here is the "direct" (response) version I used in my very first plot:
```{r}
pred = predict(logistic_reg_flat,list(Temperature=30),type="response",se.fit = T)
pred
```
The estimated Failure probability for 30° is thus $0.834$. However, the $se.fit$ value seems pretty hard to use as I can obviously not simply add $\pm 2 se.fit$ to $fit$ to compute a confidence interval.