--- title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure" author: "Anders Mårell" date: "2023-06-13" output: html_document editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` In this document we reperform some of the analysis provided in *Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069. On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way. # Technical information on the computer on which the analysis is run Here are some information about the computational environment used: ```{r} devtools::session_info() ``` # Import and inspect the data We start by reading the data: ```{r} # Get the path to the file ----------------------------------------------------- MyString <- here::here("module2", "exo5", "shuttle.csv") # Import the data -------------------------------------------------------------- MyData <- read.csv(file = MyString, header = TRUE) # Inspect the data ------------------------------------------------------------- head(MyData) tail(MyData) str(MyData) summary(MyData) ``` # Visually inspect the data and relationships between variables We start by looking at the number of malfunctions in relation to temperature: ```{r} with(MyData, plot(Malfunction/Count ~ Temperature)) ``` # Logistic regression We assume independence among observations. A simple logistic regression should allow us to estimate the influence of temperature using the `glm` function. ```{r} fm <- glm(data = MyData, Malfunction/Count ~ Temperature, weights = Count, family = binomial(link = 'logit')) summary(fm) ``` The maximum likelihood estimator of the intercept and of Temperature are thus $\hat{\alpha}=`r round(fm$coefficients['(Intercept)'], 3)`$ and $\hat{\beta}=`r round(fm$coefficients['Temperature'], 4)`$ and their standard errors are $s_{\hat{\alpha}} = `r round(summary(fm)$coefficients["(Intercept)", "Std. Error"], 3)`$ and $s_{\hat{\beta}} = `r round(summary(fm)$coefficients["Temperature", "Std. Error"], 3)`$. The Residual deviance corresponds to the Goodness of fit $G^2=`r round(sum(summary(fm)$deviance.resid^2), 3)`$ with `r summary(fm)$df.residual` degrees of freedom. __These are exactly the results presented in the article.__ # Predict failure probability The temperature when launching the shuttle was 31°F. We can try to estimate the failure probability for that temperature using our model: ```{r} # Set up sequence with temperatures -------------------------------------------- MyTemp <- seq(from = 30, to = 90, by = .5) # Make model predictions with MyTemp ------------------------------------------- MyPred <- predict(fm,list(Temperature = MyTemp), type = "response", se.fit = TRUE) # Calculate prediction intervals ----------------------------------------------- critval <- 1.96 ## approx 95% CI MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit) MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit) # Plot data, model fit and prediction intervals plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1)) points(data = MyData, Malfunction/Count ~ Temperature) lines(MyPred$upr ~ MyTemp, type = "l", lty = 2) lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2) ``` The probability of failure using my models at 31°F is `r round(predict(fm,list(Temperature = 31), type = "response"), 3)`.