Commit 27db58c7 authored by Arnaud Legrand's avatar Arnaud Legrand

Initial import with a tentative replication of Dalal et al.

parents
In this project, we gather reproduction attempts from the Challenger
study. In particular, we try to reperform some of the analysis
provided in *Risk Analysis of the Space Shuttle: Pre-Challenger
Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes,
Bruce Hoadley* published in *Journal of the American Statistical
Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available
at
[https://studies2.hec.fr/jahia/webdav/site/hec/shared/sites/czellarv/acces_anonyme/OringJASA_1989.pdf](here)
(here is [http://www.jstor.org/stable/2290069](the official JASA
webpage)).
On the fourth page of this article, they indicate that the maximum
likelihood estimates of the logistic regression using only temperature
are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their
asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and
$s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model
was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce
the computation behind these values and the Figure 4 of this article,
possibly in a nicer looking way.
[**Here is our successful replication of Dalal et al. results using
R**](file:challenger.pdf).
In case it helps, we provide you with two implementations of this case
study but we encourage you to **reimplement them by yourself** using both
your favourite language and an other language you do not know yet.
- A [Jupyter Python3 notebook](file:src/Python3/challenger.ipynb)
- An [Rmarkdown document](file:src/R/challenger.Rmd)
Then **update the [meta-study result table available
here](file:results.org) with your own results**.
File added
Update the following table with your own results by indicating in each
column:
- Language: R, Python3, Julia, Perl, C, ...
- Language version:
- Main libraries: please indicate the versions of all the loaded libraries
- Operating System: Linux, Mac OS X, Windows, Android, ... along with its version
- $\hat{\alpha}$ and $\hat{\\beta}: Identical, Similar, Different, Non
functional (expected values are $5.085$ and $-0.1156$)
- $s_{\hat{\alpha}}$ and $s_{\hat{\\beta}}: Identical, Similar, Different, Non
functional (expected values are $3.052$ and $0.047$)
- $G^2$ and degree of freedom: Identical, Similar, Different, Non
functional (expected values are $18.086$ and $21$).
- Figure: Similar, Different, Non functional
- Confidence region: Similar, Different, Non functional
| Language | Language version | Main libraries | Operating System | $\hat{\alpha}$ and $\hat{\\beta}$ | $s_{\hat{\alpha}$ and $s_{\hat{\beta}$ | $G^{2}$ | Figure | Confidence Region | Link to the document |
|----------+------------------+---------------------------------------------------------------+-----------------------------+--------------------------+-------------------------------+----------------+-----------+-------------------+-------------------------------------|
| R | 3.5.1 | ggplot2 3.0.0 | Debian GNU/Linux buster/sid | Identical | Identical | Identical | Identical | Identical | [file:src/R/challenger.Rmd] |
| Python | 3.6.5rc1 | statsmodels 0.9.0 numpy 1.14.5 pandas 0.22.0 matplotlib 2.1.1 | Linux Debian 4.15.11-1 | Identical | *Different* | *Non Functional* | Identical | *Non Functional* | [file:src/Python3/challenger.ipynb] |
This diff is collapsed.
---
title: "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
author: "Arnaud Legrand"
date: "23 September 2018"
output: pdf_document
---
In this document we reperform some of the analysis provided in
*Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069.
On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.
# Technical information on the computer on which the analysis is run
We will be using the R language using the ggplot2 library.
```{r}
library(ggplot2)
sessionInfo()
```
Here are the available libraries
```{r}
devtools::session_info()
```
# Loading and inspecting data
Let's start by reading data:
```{r}
data = read.csv("../../data/shuttle.csv",header=T)
data
```
We know from our previous experience on this data set that filtering data is a really bad idea. We will therefore process it as such.
Let's visually inspect how temperature affects malfunction:
```{r}
plot(data=data, Malfunction/Count ~ Temperature, ylim=c(0,1))
```
# Logistic regression
Let's assume O-rings indpendently fail with the same probability which solely depends on temperature. A logistic regression should allow us to estimate the influence of temperature.
```{r}
logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count,
family=binomial(link='logit'))
summary(logistic_reg)
```
The maximum likelyhood estimator of the intercept and of Temperature are thus $\hat{\alpha}=5.0849$ and $\hat{\beta}=-0.1156$ and their standard errors are $s_{\hat{\alpha}} = 3.052$ and $s_{\hat{\beta}} = 0.04702$. The Residual deviance corresponds to the Goodness of fit $G^2=18.086$ with 21 degrees of freedom. **I have therefore managed to replicate the results of the Dalal et. al. article**.
# Predicting failure probability
The temperature when launching the shuttle was 31°F. Let's try to
estimate the failure probability for such temperature using our model.:
```{r}
# shuttle=shuttle[shuttle$r!=0,]
tempv = seq(from=30, to=90, by = .5)
rmv <- predict(logistic_reg,list(Temperature=tempv),type="response")
plot(tempv,rmv,type="l",ylim=c(0,1))
points(data=data, Malfunction/Count ~ Temperature)
```
This figure is very similar to the Figure 4 of Dalal et al. **I have managed to replicate the Figure 4 of the Dalal et al. article.**
Let's try to plot confidence intervals although I am not sure exactly how they are computed.
```{r}
ggplot(data, aes(y=Malfunction/Count, x=Temperature)) + geom_point(alpha=.2, size = 2) +
geom_smooth(method = "glm", method.args = list(family = "binomial"), fullrange=T) +
xlim(30,90) + ylim(0,1) + theme_bw()
```
No confidence region was given in the original article. **Let's hope this confidence region estimation is correct.**
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment