diff --git a/module4/challenger.Rmd b/module4/challenger.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..315d86c80b2d77d8b0f923b20b21f09b76456958 --- /dev/null +++ b/module4/challenger.Rmd @@ -0,0 +1,91 @@ +--- +title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure" +author: "Anders Mårell" +date: "2023-06-13" +output: html_document +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +In this document we reperform some of the analysis provided in *Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069. + +On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way. + +# Technical information on the computer on which the analysis is run + +Here are some information about the computational environment used: + +```{r} +devtools::session_info() +``` + +# Import and inspect the data +We start by reading the data: +```{r} +# Get the path to the file ----------------------------------------------------- +MyString <- here::here("module2", "exo5", "shuttle.csv") + +# Import the data -------------------------------------------------------------- +MyData <- read.csv(file = MyString, header = TRUE) + +# Inspect the data ------------------------------------------------------------- +head(MyData) +tail(MyData) +str(MyData) +summary(MyData) +``` + +# Visually inspect the data and relationships between variables + +We start by looking at the number of malfunctions in relation to temperature: + +```{r} +with(MyData, plot(Malfunction/Count ~ Temperature)) +``` + +# Logistic regression + +We assume independence among observations. A simple logistic regression should allow us to estimate the influence of temperature using the `glm` function. + +```{r} +fm <- glm(data = MyData, + Malfunction/Count ~ Temperature, + weights = Count, + family = binomial(link = 'logit')) +summary(fm) +``` + +The maximum likelihood estimator of the intercept and of Temperature are thus $\hat{\alpha}=`r round(fm$coefficients['(Intercept)'], 3)`$ and $\hat{\beta}=`r round(fm$coefficients['Temperature'], 4)`$ and their standard errors are $s_{\hat{\alpha}} = `r round(summary(fm)$coefficients["(Intercept)", "Std. Error"], 3)`$ and $s_{\hat{\beta}} = `r round(summary(fm)$coefficients["Temperature", "Std. Error"], 3)`$. The Residual deviance corresponds to the Goodness of fit $G^2=`r round(sum(summary(fm)$deviance.resid^2), 3)`$ with `r summary(fm)$df.residual` degrees of freedom. + +__These are exactly the results presented in the article.__ + +# Predict failure probability + +The temperature when launching the shuttle was 31°F. We can try to estimate the failure probability for that temperature using our model: + +```{r} +# Set up sequence with temperatures -------------------------------------------- +MyTemp <- seq(from = 30, to = 90, by = .5) + +# Make model predictions with MyTemp ------------------------------------------- +MyPred <- predict(fm,list(Temperature = MyTemp), + type = "response", se.fit = TRUE) + +# Calculate prediction intervals ----------------------------------------------- +critval <- 1.96 ## approx 95% CI +MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit) +MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit) + +# Plot data, model fit and prediction intervals +plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1)) +points(data = MyData, Malfunction/Count ~ Temperature) +lines(MyPred$upr ~ MyTemp, type = "l", lty = 2) +lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2) + +``` + +The probability of failure using my models at 31°F is `r round(predict(fm,list(Temperature = 31), type = "response"), 3)`. diff --git a/module4/challenger.html b/module4/challenger.html new file mode 100644 index 0000000000000000000000000000000000000000..c92c78d4ebf098cef76aea2324c3b9aa2243664f --- /dev/null +++ b/module4/challenger.html @@ -0,0 +1,618 @@ + + + + +
+ + + + + + + + + + +In this document we reperform some of the analysis provided in +Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of +Failure by Siddhartha R. Dalal, Edward B. Fowlkes, Bruce +Hoadley published in Journal of the American Statistical +Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and +available at http://www.jstor.org/stable/2290069.
+On the fourth page of this article, they indicate that the maximum +likelihood estimates of the logistic regression using only temperature +are: \(\hat{\alpha}=5.085\) and \(\hat{\beta}=-0.1156\) and their asymptotic +standard errors are \(s_{\hat{\alpha}}=3.052\) and \(s_{\hat{\beta}}=0.047\). The Goodness of +fit indicated for this model was \(G^2=18.086\) with 21 degrees of freedom. +Our goal is to reproduce the computation behind these values and the +Figure 4 of this article, possibly in a nicer looking way.
+Here are some information about the computational environment +used:
+devtools::session_info()
+## ─ Session info ───────────────────────────────────────────────────────────────
+## setting value
+## version R version 4.3.0 (2023-04-21 ucrt)
+## os Windows 10 x64 (build 19045)
+## system x86_64, mingw32
+## ui RTerm
+## language (EN)
+## collate French_France.utf8
+## ctype French_France.utf8
+## tz Europe/Paris
+## date 2023-06-13
+## pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
+##
+## ─ Packages ───────────────────────────────────────────────────────────────────
+## package * version date (UTC) lib source
+## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.3.0)
+## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
+## callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0)
+## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
+## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
+## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0)
+## digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0)
+## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
+## evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
+## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
+## fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0)
+## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
+## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
+## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
+## httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.3.0)
+## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.0)
+## jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.3.0)
+## knitr 1.42 2023-01-25 [1] CRAN (R 4.3.0)
+## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0)
+## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
+## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
+## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
+## mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
+## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
+## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.3.0)
+## pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.3.0)
+## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0)
+## processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.0)
+## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0)
+## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
+## ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0)
+## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
+## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
+## Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)
+## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.3.0)
+## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
+## rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0)
+## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
+## sass 0.4.6 2023-05-03 [1] CRAN (R 4.3.0)
+## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
+## shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0)
+## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
+## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
+## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
+## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.3.0)
+## vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0)
+## xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
+## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
+## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
+##
+## [1] C:/Users/anders.marell/AppData/Local/R/win-library/4.3
+## [2] C:/Program Files/R/R-4.3.0/library
+##
+## ──────────────────────────────────────────────────────────────────────────────
+We start by reading the data:
+# Get the path to the file -----------------------------------------------------
+MyString <- here::here("module2", "exo5", "shuttle.csv")
+
+# Import the data --------------------------------------------------------------
+MyData <- read.csv(file = MyString, header = TRUE)
+
+# Inspect the data -------------------------------------------------------------
+head(MyData)
+## Date Count Temperature Pressure Malfunction
+## 1 4/12/81 6 66 50 0
+## 2 11/12/81 6 70 50 1
+## 3 3/22/82 6 69 50 0
+## 4 11/11/82 6 68 50 0
+## 5 4/04/83 6 67 50 0
+## 6 6/18/82 6 72 50 0
+tail(MyData)
+## Date Count Temperature Pressure Malfunction
+## 18 7/29/85 6 81 200 0
+## 19 8/27/85 6 76 200 0
+## 20 10/03/85 6 79 200 0
+## 21 10/30/85 6 75 200 2
+## 22 11/26/85 6 76 200 0
+## 23 1/12/86 6 58 200 1
+str(MyData)
+## 'data.frame': 23 obs. of 5 variables:
+## $ Date : chr "4/12/81" "11/12/81" "3/22/82" "11/11/82" ...
+## $ Count : int 6 6 6 6 6 6 6 6 6 6 ...
+## $ Temperature: int 66 70 69 68 67 72 73 70 57 63 ...
+## $ Pressure : int 50 50 50 50 50 50 100 100 200 200 ...
+## $ Malfunction: int 0 1 0 0 0 0 0 0 1 1 ...
+summary(MyData)
+## Date Count Temperature Pressure
+## Length:23 Min. :6 Min. :53.00 Min. : 50.0
+## Class :character 1st Qu.:6 1st Qu.:67.00 1st Qu.: 75.0
+## Mode :character Median :6 Median :70.00 Median :200.0
+## Mean :6 Mean :69.57 Mean :152.2
+## 3rd Qu.:6 3rd Qu.:75.00 3rd Qu.:200.0
+## Max. :6 Max. :81.00 Max. :200.0
+## Malfunction
+## Min. :0.0000
+## 1st Qu.:0.0000
+## Median :0.0000
+## Mean :0.3913
+## 3rd Qu.:1.0000
+## Max. :2.0000
+We start by looking at the number of malfunctions in relation to +temperature:
+with(MyData, plot(Malfunction/Count ~ Temperature))
+We assume independence among observations. A simple logistic
+regression should allow us to estimate the influence of temperature
+using the glm
function.
fm <- glm(data = MyData,
+ Malfunction/Count ~ Temperature,
+ weights = Count,
+ family = binomial(link = 'logit'))
+summary(fm)
+##
+## Call:
+## glm(formula = Malfunction/Count ~ Temperature, family = binomial(link = "logit"),
+## data = MyData, weights = Count)
+##
+## Coefficients:
+## Estimate Std. Error z value Pr(>|z|)
+## (Intercept) 5.08498 3.05247 1.666 0.0957 .
+## Temperature -0.11560 0.04702 -2.458 0.0140 *
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## (Dispersion parameter for binomial family taken to be 1)
+##
+## Null deviance: 24.230 on 22 degrees of freedom
+## Residual deviance: 18.086 on 21 degrees of freedom
+## AIC: 35.647
+##
+## Number of Fisher Scoring iterations: 5
+The maximum likelihood estimator of the intercept and of Temperature +are thus \(\hat{\alpha}=5.085\) and +\(\hat{\beta}=-0.1156\) and their +standard errors are \(s_{\hat{\alpha}} = +3.052\) and \(s_{\hat{\beta}} = +0.047\). The Residual deviance corresponds to the Goodness of fit +\(G^2=18.086\) with 21 degrees of +freedom.
+These are exactly the results presented in the +article.
+The temperature when launching the shuttle was 31°F. We can try to +estimate the failure probability for that temperature using our +model:
+# Set up sequence with temperatures --------------------------------------------
+MyTemp <- seq(from = 30, to = 90, by = .5)
+
+# Make model predictions with MyTemp -------------------------------------------
+MyPred <- predict(fm,list(Temperature = MyTemp),
+ type = "response", se.fit = TRUE)
+
+# Calculate prediction intervals -----------------------------------------------
+critval <- 1.96 ## approx 95% CI
+MyPred$upr <- MyPred$fit + (critval * MyPred$se.fit)
+MyPred$lwr <- MyPred$fit - (critval * MyPred$se.fit)
+
+# Plot data, model fit and prediction intervals
+plot(MyPred$fit ~ MyTemp, type = "l", ylim = c(0, 1))
+points(data = MyData, Malfunction/Count ~ Temperature)
+lines(MyPred$upr ~ MyTemp, type = "l", lty = 2)
+lines(MyPred$lwr ~ MyTemp, type = "l", lty = 2)
+The probability of failure using my models at 31°F is 0.818.
+