--- title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure" author: "Arnaud Legrand" date: "23 September 2018" output: pdf_document --- In this document we reperform some of the analysis provided in *Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069. On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way. # Technical information on the computer on which the analysis is run We will be using the R language using the ggplot2 library. ```{r} library(ggplot2) sessionInfo() ``` Here are the available libraries ```{r} devtools::session_info() ``` # Loading and inspecting data Let's start by reading data: ```{r} data = read.csv("../../data/shuttle.csv",header=T) data ``` We know from our previous experience on this data set that filtering data is a really bad idea. We will therefore process it as such. Let's visually inspect how temperature affects malfunction: ```{r} plot(data=data, Malfunction/Count ~ Temperature, ylim=c(0,1)) ``` # Logistic regression Let's assume O-rings indpendently fail with the same probability which solely depends on temperature. A logistic regression should allow us to estimate the influence of temperature. ```{r} logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count, family=binomial(link='logit')) summary(logistic_reg) ``` The maximum likelyhood estimator of the intercept and of Temperature are thus $\hat{\alpha}=5.0849$ and $\hat{\beta}=-0.1156$ and their standard errors are $s_{\hat{\alpha}} = 3.052$ and $s_{\hat{\beta}} = 0.04702$. The Residual deviance corresponds to the Goodness of fit $G^2=18.086$ with 21 degrees of freedom. **I have therefore managed to replicate the results of the Dalal et. al. article**. # Predicting failure probability The temperature when launching the shuttle was 31°F. Let's try to estimate the failure probability for such temperature using our model.: ```{r} # shuttle=shuttle[shuttle$r!=0,] tempv = seq(from=30, to=90, by = .5) rmv <- predict(logistic_reg,list(Temperature=tempv),type="response") plot(tempv,rmv,type="l",ylim=c(0,1)) points(data=data, Malfunction/Count ~ Temperature) ``` This figure is very similar to the Figure 4 of Dalal et al. **I have managed to replicate the Figure 4 of the Dalal et al. article.** Let's try to plot confidence intervals although I am not sure exactly how they are computed. ```{r} ggplot(data, aes(y=Malfunction/Count, x=Temperature)) + geom_point(alpha=.2, size = 2) + geom_smooth(method = "glm", method.args = list(family = "binomial"), fullrange=T) + xlim(30,90) + ylim(0,1) + theme_bw() ``` No confidence region was given in the original article. **Let's hope this confidence region estimation is correct.**