---
title : "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
options:
  css: skeleton_css.css
  template: julia_html.tpl
---

In this document we reperform some of the analysis provided in *Risk Analysis of
the Space Shuttle: Pre-Challenger Prediction of Failure* by Siddhartha R. Dalal,
Edward B. Fowlkes, Bruce Hoadley published in Journal of the American
Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and
available at [http://www.jstor.org/stable/2290069](http://www.jstor.org/stable/2290069).

On the fourth page of this article, they indicate that the maximum likelihood
estimates of the logistic regression using only temperature are:
$\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard
errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of
fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our
goal is to reproduce the computation behind these values and the Figure 4 of
this article, possibly in a nicer looking way.


# Technical information on the computer on which the analysis is run

We will be using the [Julia](http://julialang.org) language:
```julia
using InteractiveUtils
versioninfo()
```

The computations rely on a number of packages in the Julia ecosystem. The direct
dependencies are summarized hereafter; the complete environment is described in
the [`Manifest.toml`](Manifest.toml) file.

```julia
# Setup environment
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()

# Load dependencies
using HTTP, CSV
using Plots; plotly()
using GLM
using DataFrames
using Printf
include("utils.jl")

# Summary
Pkg.status()
```


# Loading and inspecting data

Let's start by reading data.

```julia
res = HTTP.request(:GET, "https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/raw/master/data/shuttle.csv?inline=false")
data = CSV.read(res.body)
```

We know from our previous experience on this data set that filtering data is a
really bad idea. We will therefore process it as such.

```julia; results="raw"
data.Frequency = data.Malfunction ./ data.Count

plot(xlabel="Temperature [F]", ylabel="Frequency")
plot!(data.Temperature, data.Frequency, seriestype=:scatter, label=nothing)
disp()
```


# Logistic regression

Let's assume O-rings independently fail with the same probability which solely
depends on temperature. A logistic regression should allow us to estimate the
influence of temperature.

```julia; wrap=false; hold=true
model = glm(@formula(Frequency ~ Temperature), data,
            Binomial(), LogitLink())

α, β   = coef(model)
σα, σβ = stderror(model)

G²     = deviance(model)
nDOF   = Int(dof_residual(model))

model
```

The maximum likelyhood estimator of the intercept and of Temperature are thus
$\hat{\alpha}=`j @printf "%.3f" α`$ and
$\hat{\beta}=`j @printf "%.3f" β`$.

This corresponds to the values from the article of Dalal et al. The standard
errors are
$s_{\hat{\alpha}} = `j @printf "%.3f" σα`$ and
$s_{\hat{\beta}} = `j @printf "%.3f" σβ`$,
which is different from the $3.052$ and $0.047$ reported by Dallal et al.

The deviance is $G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom.
I cannot find any value similar to the Goodness of fitreported by Dalal *et al.*
($G^2=18.086$). However, the number of degrees of freedom is different but at
least similar to theirs (21).

There seems to be something wrong. Oh I know, I haven't indicated that my
observations are actually the result of 6 observations for each rocket
launch. Let's indicate these weights (since the weights are always the same
throughout all experiments, it does not change the estimates of the fit but it
does influence de variance estimate).

```julia; wrap=false; hold=true
model = glm(@formula(Frequency ~ Temperature), data,
            Binomial(), LogitLink();
            wts=data.Count)

α, β   = coef(model)
σα, σβ = stderror(model)

G²     = deviance(model)
nDOF   = Int(dof_residual(model))

model
```

Good, now I have recovered the asymptotic standard errors
$s_{\hat{\alpha}} = `j @printf "%.3f" σα`$ and
$s_{\hat{\beta}} = `j @printf "%.3f" σβ`$,

The Goodness of fit (Deviance) indicated for this model is
$G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom. Now $G^2$ is in
good accordance to the results of the Dalal *et al.* article, but the number of
degrees of freedom is approximately 6 times larger than that of Dalal *et
al*. Note that, even removing this factor (which is probably due to the way the
number of residual degrees of freedom are defined in both libraries in the
presence of weights), the values are similar but still differ by
`j @printf "%2.0f" 100 * (nDOF/6/21 - 1)`%.

# Predicting failure probability

The temperature when launching the shuttle was 31°F. Let's try to estimate the
failure probability for such temperature using our model:

```julia; results="raw"
prediction = DataFrame(Temperature=30:0.25:90)
prediction.Frequency = predict(model, prediction)

plot(xlabel="Temperature [F]", ylabel="Frequency")
plot!(data.Temperature, data.Frequency, seriestype=:scatter, label="data")
plot!(prediction.Temperature, prediction.Frequency, label="prediction")
disp()
```

This figure is very similar to the Figure 4 of Dalal *et al*.

<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- ispell-local-dictionary: "french" -->
<!-- End: -->