In this document we reperform some of the analysis provided in Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure by Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley published in Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069.

On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: \(\hat{\alpha}=5.085\) and \(\hat{\beta}=-0.1156\) and their asymptotic standard errors are \(s_{\hat{\alpha}}=3.052\) and \(s_{\hat{\beta}}=0.047\). The Goodness of fit indicated for this model was \(G^2=18.086\) with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.

Technical information on the computer on which the analysis is run

Here are some information about the computational environment used:

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.0 (2023-04-21 ucrt)
##  os       Windows 10 x64 (build 19045)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  French_France.utf8
##  ctype    French_France.utf8
##  tz       Europe/Paris
##  date     2023-06-13
##  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  bslib         0.4.2   2022-12-16 [1] CRAN (R 4.3.0)
##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
##  callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
##  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
##  digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
##  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
##  fs            1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
##  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
##  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
##  httpuv        1.6.9   2023-02-14 [1] CRAN (R 4.3.0)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.0)
##  jsonlite      1.8.4   2022-12-06 [1] CRAN (R 4.3.0)
##  knitr         1.42    2023-01-25 [1] CRAN (R 4.3.0)
##  later         1.3.1   2023-05-02 [1] CRAN (R 4.3.0)
##  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
##  mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
##  pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.3.0)
##  pkgload       1.3.2   2022-11-16 [1] CRAN (R 4.3.0)
##  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.3.0)
##  processx      3.8.1   2023-04-18 [1] CRAN (R 4.3.0)
##  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
##  ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
##  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
##  Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.3.0)
##  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
##  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
##  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
##  sass          0.4.6   2023-05-03 [1] CRAN (R 4.3.0)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
##  shiny         1.7.4   2022-12-15 [1] CRAN (R 4.3.0)
##  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
##  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.3.0)
##  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
##  xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
##  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
## 
##  [1] C:/Users/anders.marell/AppData/Local/R/win-library/4.3
##  [2] C:/Program Files/R/R-4.3.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────

Import and inspect the data

We start by reading the data:

# Get the path to the file -----------------------------------------------------
MyString <- here::here("module2", "exo5", "shuttle.csv")

# Import the data --------------------------------------------------------------
MyData <- read.csv(file = MyString, header = TRUE)

# Inspect the data -------------------------------------------------------------
head(MyData)
##       Date Count Temperature Pressure Malfunction
## 1  4/12/81     6          66       50           0
## 2 11/12/81     6          70       50           1
## 3  3/22/82     6          69       50           0
## 4 11/11/82     6          68       50           0
## 5  4/04/83     6          67       50           0
## 6  6/18/82     6          72       50           0
tail(MyData)
##        Date Count Temperature Pressure Malfunction
## 18  7/29/85     6          81      200           0
## 19  8/27/85     6          76      200           0
## 20 10/03/85     6          79      200           0
## 21 10/30/85     6          75      200           2
## 22 11/26/85     6          76      200           0
## 23  1/12/86     6          58      200           1
str(MyData)
## 'data.frame':    23 obs. of  5 variables:
##  $ Date       : chr  "4/12/81" "11/12/81" "3/22/82" "11/11/82" ...
##  $ Count      : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ Temperature: int  66 70 69 68 67 72 73 70 57 63 ...
##  $ Pressure   : int  50 50 50 50 50 50 100 100 200 200 ...
##  $ Malfunction: int  0 1 0 0 0 0 0 0 1 1 ...
summary(MyData)
##      Date               Count    Temperature       Pressure    
##  Length:23          Min.   :6   Min.   :53.00   Min.   : 50.0  
##  Class :character   1st Qu.:6   1st Qu.:67.00   1st Qu.: 75.0  
##  Mode  :character   Median :6   Median :70.00   Median :200.0  
##                     Mean   :6   Mean   :69.57   Mean   :152.2  
##                     3rd Qu.:6   3rd Qu.:75.00   3rd Qu.:200.0  
##                     Max.   :6   Max.   :81.00   Max.   :200.0  
##   Malfunction    
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.3913  
##  3rd Qu.:1.0000  
##  Max.   :2.0000

Visually inspect the data and relationships between variables

We start by looking at the number of malfunctions in relation to temperature:

with(MyData, plot(Malfunction/Count ~ Temperature))

Logistic regression

We assume independence among observations. A simple logistic regression should allow us to estimate the influence of temperature using the glm function.

fm <- glm(data = MyData, 
          Malfunction/Count ~ Temperature, 
          weights = Count,
          family = binomial(link = 'logit'))
summary(fm)
## 
## Call:
## glm(formula = Malfunction/Count ~ Temperature, family = binomial(link = "logit"), 
##     data = MyData, weights = Count)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  5.08498    3.05247   1.666   0.0957 .
## Temperature -0.11560    0.04702  -2.458   0.0140 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 24.230  on 22  degrees of freedom
## Residual deviance: 18.086  on 21  degrees of freedom
## AIC: 35.647
## 
## Number of Fisher Scoring iterations: 5

The maximum likelihood estimator of the intercept and of Temperature are thus \(\hat{\alpha}=5.085\) and \(\hat{\beta}=-0.1156\) and their standard errors are \(s_{\hat{\alpha}} = 3.052\) and \(s_{\hat{\beta}} = 0.047\). The Residual deviance corresponds to the Goodness of fit \(G^2=18.086\) with 21 degrees of freedom.

These are exactly the results presented in the article.

Predict failure probability

The temperature when launching the shuttle was 31°F. We can try to estimate the failure probability for that temperature using our model:

# Set up sequence with temperatures --------------------------------------------
MyTemp <- seq(from = 30, to = 90, by = .5)

# Make model predictions with MyTemp -------------------------------------------
MyPred <- predict(fm,list(Temperature = MyTemp), 
                  type = "response", se.fit = TRUE)

# Calculate prediction intervals -----------------------------------------------
critval <- 1.96 ## approx 95% CI
MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit)
MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit)

# Plot data, model fit and prediction intervals
plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1))
points(data = MyData, Malfunction/Count ~ Temperature)
lines(MyPred$upr ~ MyTemp, type = "l", lty = 2)
lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2)

The probability of failure using my models at 31°F is 0.818.