diff --git a/module4/challenger.Rmd b/module4/challenger.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..315d86c80b2d77d8b0f923b20b21f09b76456958 --- /dev/null +++ b/module4/challenger.Rmd @@ -0,0 +1,91 @@ +--- +title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure" +author: "Anders Mårell" +date: "2023-06-13" +output: html_document +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +In this document we reperform some of the analysis provided in *Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069. + +On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way. + +# Technical information on the computer on which the analysis is run + +Here are some information about the computational environment used: + +```{r} +devtools::session_info() +``` + +# Import and inspect the data +We start by reading the data: +```{r} +# Get the path to the file ----------------------------------------------------- +MyString <- here::here("module2", "exo5", "shuttle.csv") + +# Import the data -------------------------------------------------------------- +MyData <- read.csv(file = MyString, header = TRUE) + +# Inspect the data ------------------------------------------------------------- +head(MyData) +tail(MyData) +str(MyData) +summary(MyData) +``` + +# Visually inspect the data and relationships between variables + +We start by looking at the number of malfunctions in relation to temperature: + +```{r} +with(MyData, plot(Malfunction/Count ~ Temperature)) +``` + +# Logistic regression + +We assume independence among observations. A simple logistic regression should allow us to estimate the influence of temperature using the `glm` function. + +```{r} +fm <- glm(data = MyData, + Malfunction/Count ~ Temperature, + weights = Count, + family = binomial(link = 'logit')) +summary(fm) +``` + +The maximum likelihood estimator of the intercept and of Temperature are thus $\hat{\alpha}=`r round(fm$coefficients['(Intercept)'], 3)`$ and $\hat{\beta}=`r round(fm$coefficients['Temperature'], 4)`$ and their standard errors are $s_{\hat{\alpha}} = `r round(summary(fm)$coefficients["(Intercept)", "Std. Error"], 3)`$ and $s_{\hat{\beta}} = `r round(summary(fm)$coefficients["Temperature", "Std. Error"], 3)`$. The Residual deviance corresponds to the Goodness of fit $G^2=`r round(sum(summary(fm)$deviance.resid^2), 3)`$ with `r summary(fm)$df.residual` degrees of freedom. + +__These are exactly the results presented in the article.__ + +# Predict failure probability + +The temperature when launching the shuttle was 31°F. We can try to estimate the failure probability for that temperature using our model: + +```{r} +# Set up sequence with temperatures -------------------------------------------- +MyTemp <- seq(from = 30, to = 90, by = .5) + +# Make model predictions with MyTemp ------------------------------------------- +MyPred <- predict(fm,list(Temperature = MyTemp), + type = "response", se.fit = TRUE) + +# Calculate prediction intervals ----------------------------------------------- +critval <- 1.96 ## approx 95% CI +MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit) +MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit) + +# Plot data, model fit and prediction intervals +plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1)) +points(data = MyData, Malfunction/Count ~ Temperature) +lines(MyPred$upr ~ MyTemp, type = "l", lty = 2) +lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2) + +``` + +The probability of failure using my models at 31°F is `r round(predict(fm,list(Temperature = 31), type = "response"), 3)`. diff --git a/module4/challenger.html b/module4/challenger.html new file mode 100644 index 0000000000000000000000000000000000000000..c92c78d4ebf098cef76aea2324c3b9aa2243664f --- /dev/null +++ b/module4/challenger.html @@ -0,0 +1,618 @@ + + + + + + + + + + + + + + + +Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +

In this document we reperform some of the analysis provided in +Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of +Failure by Siddhartha R. Dalal, Edward B. Fowlkes, Bruce +Hoadley published in Journal of the American Statistical +Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and +available at http://www.jstor.org/stable/2290069.

+

On the fourth page of this article, they indicate that the maximum +likelihood estimates of the logistic regression using only temperature +are: \(\hat{\alpha}=5.085\) and \(\hat{\beta}=-0.1156\) and their asymptotic +standard errors are \(s_{\hat{\alpha}}=3.052\) and \(s_{\hat{\beta}}=0.047\). The Goodness of +fit indicated for this model was \(G^2=18.086\) with 21 degrees of freedom. +Our goal is to reproduce the computation behind these values and the +Figure 4 of this article, possibly in a nicer looking way.

+
+

Technical information on the computer on which the analysis is +run

+

Here are some information about the computational environment +used:

+
devtools::session_info()
+
## ─ Session info ───────────────────────────────────────────────────────────────
+##  setting  value
+##  version  R version 4.3.0 (2023-04-21 ucrt)
+##  os       Windows 10 x64 (build 19045)
+##  system   x86_64, mingw32
+##  ui       RTerm
+##  language (EN)
+##  collate  French_France.utf8
+##  ctype    French_France.utf8
+##  tz       Europe/Paris
+##  date     2023-06-13
+##  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
+## 
+## ─ Packages ───────────────────────────────────────────────────────────────────
+##  package     * version date (UTC) lib source
+##  bslib         0.4.2   2022-12-16 [1] CRAN (R 4.3.0)
+##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
+##  callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
+##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+##  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
+##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
+##  digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
+##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
+##  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+##  fs            1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
+##  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
+##  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
+##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+##  httpuv        1.6.9   2023-02-14 [1] CRAN (R 4.3.0)
+##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.0)
+##  jsonlite      1.8.4   2022-12-06 [1] CRAN (R 4.3.0)
+##  knitr         1.42    2023-01-25 [1] CRAN (R 4.3.0)
+##  later         1.3.1   2023-05-02 [1] CRAN (R 4.3.0)
+##  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
+##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
+##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
+##  mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
+##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
+##  pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.3.0)
+##  pkgload       1.3.2   2022-11-16 [1] CRAN (R 4.3.0)
+##  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.3.0)
+##  processx      3.8.1   2023-04-18 [1] CRAN (R 4.3.0)
+##  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
+##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
+##  ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
+##  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
+##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
+##  Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
+##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.3.0)
+##  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+##  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
+##  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
+##  sass          0.4.6   2023-05-03 [1] CRAN (R 4.3.0)
+##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+##  shiny         1.7.4   2022-12-15 [1] CRAN (R 4.3.0)
+##  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
+##  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
+##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
+##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.3.0)
+##  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
+##  xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
+##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
+##  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+## 
+##  [1] C:/Users/anders.marell/AppData/Local/R/win-library/4.3
+##  [2] C:/Program Files/R/R-4.3.0/library
+## 
+## ──────────────────────────────────────────────────────────────────────────────
+
+
+

Import and inspect the data

+

We start by reading the data:

+
# Get the path to the file -----------------------------------------------------
+MyString <- here::here("module2", "exo5", "shuttle.csv")
+
+# Import the data --------------------------------------------------------------
+MyData <- read.csv(file = MyString, header = TRUE)
+
+# Inspect the data -------------------------------------------------------------
+head(MyData)
+
##       Date Count Temperature Pressure Malfunction
+## 1  4/12/81     6          66       50           0
+## 2 11/12/81     6          70       50           1
+## 3  3/22/82     6          69       50           0
+## 4 11/11/82     6          68       50           0
+## 5  4/04/83     6          67       50           0
+## 6  6/18/82     6          72       50           0
+
tail(MyData)
+
##        Date Count Temperature Pressure Malfunction
+## 18  7/29/85     6          81      200           0
+## 19  8/27/85     6          76      200           0
+## 20 10/03/85     6          79      200           0
+## 21 10/30/85     6          75      200           2
+## 22 11/26/85     6          76      200           0
+## 23  1/12/86     6          58      200           1
+
str(MyData)
+
## 'data.frame':    23 obs. of  5 variables:
+##  $ Date       : chr  "4/12/81" "11/12/81" "3/22/82" "11/11/82" ...
+##  $ Count      : int  6 6 6 6 6 6 6 6 6 6 ...
+##  $ Temperature: int  66 70 69 68 67 72 73 70 57 63 ...
+##  $ Pressure   : int  50 50 50 50 50 50 100 100 200 200 ...
+##  $ Malfunction: int  0 1 0 0 0 0 0 0 1 1 ...
+
summary(MyData)
+
##      Date               Count    Temperature       Pressure    
+##  Length:23          Min.   :6   Min.   :53.00   Min.   : 50.0  
+##  Class :character   1st Qu.:6   1st Qu.:67.00   1st Qu.: 75.0  
+##  Mode  :character   Median :6   Median :70.00   Median :200.0  
+##                     Mean   :6   Mean   :69.57   Mean   :152.2  
+##                     3rd Qu.:6   3rd Qu.:75.00   3rd Qu.:200.0  
+##                     Max.   :6   Max.   :81.00   Max.   :200.0  
+##   Malfunction    
+##  Min.   :0.0000  
+##  1st Qu.:0.0000  
+##  Median :0.0000  
+##  Mean   :0.3913  
+##  3rd Qu.:1.0000  
+##  Max.   :2.0000
+
+
+

Visually inspect the data and relationships between variables

+

We start by looking at the number of malfunctions in relation to +temperature:

+
with(MyData, plot(Malfunction/Count ~ Temperature))
+

+
+
+

Logistic regression

+

We assume independence among observations. A simple logistic +regression should allow us to estimate the influence of temperature +using the glm function.

+
fm <- glm(data = MyData, 
+          Malfunction/Count ~ Temperature, 
+          weights = Count,
+          family = binomial(link = 'logit'))
+summary(fm)
+
## 
+## Call:
+## glm(formula = Malfunction/Count ~ Temperature, family = binomial(link = "logit"), 
+##     data = MyData, weights = Count)
+## 
+## Coefficients:
+##             Estimate Std. Error z value Pr(>|z|)  
+## (Intercept)  5.08498    3.05247   1.666   0.0957 .
+## Temperature -0.11560    0.04702  -2.458   0.0140 *
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## (Dispersion parameter for binomial family taken to be 1)
+## 
+##     Null deviance: 24.230  on 22  degrees of freedom
+## Residual deviance: 18.086  on 21  degrees of freedom
+## AIC: 35.647
+## 
+## Number of Fisher Scoring iterations: 5
+

The maximum likelihood estimator of the intercept and of Temperature +are thus \(\hat{\alpha}=5.085\) and +\(\hat{\beta}=-0.1156\) and their +standard errors are \(s_{\hat{\alpha}} = +3.052\) and \(s_{\hat{\beta}} = +0.047\). The Residual deviance corresponds to the Goodness of fit +\(G^2=18.086\) with 21 degrees of +freedom.

+

These are exactly the results presented in the +article.

+
+
+

Predict failure probability

+

The temperature when launching the shuttle was 31°F. We can try to +estimate the failure probability for that temperature using our +model:

+
# Set up sequence with temperatures --------------------------------------------
+MyTemp <- seq(from = 30, to = 90, by = .5)
+
+# Make model predictions with MyTemp -------------------------------------------
+MyPred <- predict(fm,list(Temperature = MyTemp), 
+                  type = "response", se.fit = TRUE)
+
+# Calculate prediction intervals -----------------------------------------------
+critval <- 1.96 ## approx 95% CI
+MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit)
+MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit)
+
+# Plot data, model fit and prediction intervals
+plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1))
+points(data = MyData, Malfunction/Count ~ Temperature)
+lines(MyPred$upr ~ MyTemp, type = "l", lty = 2)
+lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2)
+

+

The probability of failure using my models at 31°F is 0.818.

+
+ + + + +
+ + + + + + + + + + + + + + +