In this document we reperform some of the analysis provided in Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure by Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley published in Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069.
On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: \(\hat{\alpha}=5.085\) and \(\hat{\beta}=-0.1156\) and their asymptotic standard errors are \(s_{\hat{\alpha}}=3.052\) and \(s_{\hat{\beta}}=0.047\). The Goodness of fit indicated for this model was \(G^2=18.086\) with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.
Here are some information about the computational environment used:
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.3.0 (2023-04-21 ucrt)
## os Windows 10 x64 (build 19045)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate French_France.utf8
## ctype French_France.utf8
## tz Europe/Paris
## date 2023-06-13
## pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.3.0)
## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
## callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0)
## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0)
## digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
## evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
## fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
## httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.3.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.0)
## jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.3.0)
## knitr 1.42 2023-01-25 [1] CRAN (R 4.3.0)
## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0)
## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
## mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.3.0)
## pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.3.0)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0)
## processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.0)
## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
## ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0)
## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
## Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.3.0)
## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
## rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0)
## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
## sass 0.4.6 2023-05-03 [1] CRAN (R 4.3.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
## shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0)
## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.3.0)
## vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0)
## xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
##
## [1] C:/Users/anders.marell/AppData/Local/R/win-library/4.3
## [2] C:/Program Files/R/R-4.3.0/library
##
## ──────────────────────────────────────────────────────────────────────────────
We start by reading the data:
# Get the path to the file -----------------------------------------------------
MyString <- here::here("module2", "exo5", "shuttle.csv")
# Import the data --------------------------------------------------------------
MyData <- read.csv(file = MyString, header = TRUE)
# Inspect the data -------------------------------------------------------------
head(MyData)
## Date Count Temperature Pressure Malfunction
## 1 4/12/81 6 66 50 0
## 2 11/12/81 6 70 50 1
## 3 3/22/82 6 69 50 0
## 4 11/11/82 6 68 50 0
## 5 4/04/83 6 67 50 0
## 6 6/18/82 6 72 50 0
tail(MyData)
## Date Count Temperature Pressure Malfunction
## 18 7/29/85 6 81 200 0
## 19 8/27/85 6 76 200 0
## 20 10/03/85 6 79 200 0
## 21 10/30/85 6 75 200 2
## 22 11/26/85 6 76 200 0
## 23 1/12/86 6 58 200 1
str(MyData)
## 'data.frame': 23 obs. of 5 variables:
## $ Date : chr "4/12/81" "11/12/81" "3/22/82" "11/11/82" ...
## $ Count : int 6 6 6 6 6 6 6 6 6 6 ...
## $ Temperature: int 66 70 69 68 67 72 73 70 57 63 ...
## $ Pressure : int 50 50 50 50 50 50 100 100 200 200 ...
## $ Malfunction: int 0 1 0 0 0 0 0 0 1 1 ...
summary(MyData)
## Date Count Temperature Pressure
## Length:23 Min. :6 Min. :53.00 Min. : 50.0
## Class :character 1st Qu.:6 1st Qu.:67.00 1st Qu.: 75.0
## Mode :character Median :6 Median :70.00 Median :200.0
## Mean :6 Mean :69.57 Mean :152.2
## 3rd Qu.:6 3rd Qu.:75.00 3rd Qu.:200.0
## Max. :6 Max. :81.00 Max. :200.0
## Malfunction
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3913
## 3rd Qu.:1.0000
## Max. :2.0000
We start by looking at the number of malfunctions in relation to temperature:
with(MyData, plot(Malfunction/Count ~ Temperature))
We assume independence among observations. A simple logistic
regression should allow us to estimate the influence of temperature
using the glm function.
fm <- glm(data = MyData,
Malfunction/Count ~ Temperature,
weights = Count,
family = binomial(link = 'logit'))
summary(fm)
##
## Call:
## glm(formula = Malfunction/Count ~ Temperature, family = binomial(link = "logit"),
## data = MyData, weights = Count)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.08498 3.05247 1.666 0.0957 .
## Temperature -0.11560 0.04702 -2.458 0.0140 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 24.230 on 22 degrees of freedom
## Residual deviance: 18.086 on 21 degrees of freedom
## AIC: 35.647
##
## Number of Fisher Scoring iterations: 5
The maximum likelihood estimator of the intercept and of Temperature are thus \(\hat{\alpha}=5.085\) and \(\hat{\beta}=-0.1156\) and their standard errors are \(s_{\hat{\alpha}} = 3.052\) and \(s_{\hat{\beta}} = 0.047\). The Residual deviance corresponds to the Goodness of fit \(G^2=18.086\) with 21 degrees of freedom.
These are exactly the results presented in the article.
The temperature when launching the shuttle was 31°F. We can try to estimate the failure probability for that temperature using our model:
# Set up sequence with temperatures --------------------------------------------
MyTemp <- seq(from = 30, to = 90, by = .5)
# Make model predictions with MyTemp -------------------------------------------
MyPred <- predict(fm,list(Temperature = MyTemp),
type = "response", se.fit = TRUE)
# Calculate prediction intervals -----------------------------------------------
critval <- 1.96 ## approx 95% CI
MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit)
MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit)
# Plot data, model fit and prediction intervals
plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1))
points(data = MyData, Malfunction/Count ~ Temperature)
lines(MyPred$upr ~ MyTemp, type = "l", lty = 2)
lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2)
The probability of failure using my models at 31°F is 0.818.