---
title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
author: "Anders Mårell"
date: "2023-06-13"
output: html_document
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

In this document we reperform some of the analysis provided in *Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069.

On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.

# Technical information on the computer on which the analysis is run

Here are some information about the computational environment used:

```{r}
devtools::session_info()
```

# Import and inspect the data
We start by reading the data:
```{r}
# Get the path to the file -----------------------------------------------------
MyString <- here::here("module2", "exo5", "shuttle.csv")

# Import the data --------------------------------------------------------------
MyData <- read.csv(file = MyString, header = TRUE)

# Inspect the data -------------------------------------------------------------
head(MyData)
tail(MyData)
str(MyData)
summary(MyData)
```

# Visually inspect the data and relationships between variables

We start by looking at the number of malfunctions in relation to temperature:

```{r}
with(MyData, plot(Malfunction/Count ~ Temperature))
```

# Logistic regression

We assume independence among observations. A simple logistic regression should allow us to estimate the influence of temperature using the `glm` function.

```{r}
fm <- glm(data = MyData, 
          Malfunction/Count ~ Temperature, 
          weights = Count,
          family = binomial(link = 'logit'))
summary(fm)
```

The maximum likelihood estimator of the intercept and of Temperature are thus $\hat{\alpha}=`r round(fm$coefficients['(Intercept)'], 3)`$ and $\hat{\beta}=`r round(fm$coefficients['Temperature'], 4)`$ and their standard errors are $s_{\hat{\alpha}} = `r round(summary(fm)$coefficients["(Intercept)", "Std. Error"], 3)`$ and $s_{\hat{\beta}} = `r round(summary(fm)$coefficients["Temperature", "Std. Error"], 3)`$. The Residual deviance corresponds to the Goodness of fit $G^2=`r round(sum(summary(fm)$deviance.resid^2), 3)`$ with `r summary(fm)$df.residual` degrees of freedom.

__These are exactly the results presented in the article.__

# Predict failure probability

The temperature when launching the shuttle was 31°F. We can try to estimate the failure probability for that temperature using our model:

```{r}
# Set up sequence with temperatures --------------------------------------------
MyTemp <- seq(from = 30, to = 90, by = .5)

# Make model predictions with MyTemp -------------------------------------------
MyPred <- predict(fm,list(Temperature = MyTemp), 
                  type = "response", se.fit = TRUE)

# Calculate prediction intervals -----------------------------------------------
critval <- 1.96 ## approx 95% CI
MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit)
MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit)

# Plot data, model fit and prediction intervals
plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1))
points(data = MyData, Malfunction/Count ~ Temperature)
lines(MyPred$upr ~ MyTemp, type = "l", lty = 2)
lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2)

```

The probability of failure using my models at 31°F is `r round(predict(fm,list(Temperature = 31), type = "response"), 3)`.