Commit b1342638 authored by François Févotte's avatar François Févotte

Ex 04-1 : analyse Challenger en Julia

parent 7ad30490
This diff is collapsed.
name = "Challenger"
uuid = "457123a2-77f0-4d56-a035-49d1b3c810fd"
authors = ["François Févotte"]
version = "0.1.0"
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DataInterpolations = "82cc6244-b520-54b8-b5a6-8a565e85f1d0"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
PackageCompiler = "9b87118b-4619-50d2-8e1e-99f35a4d4d9d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Weave = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"
This diff is collapsed.
---
title : "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
options:
css: skeleton_css.css
template: julia_html.tpl
---
In this document we reperform some of the analysis provided in *Risk Analysis of
the Space Shuttle: Pre-Challenger Prediction of Failure* by Siddhartha R. Dalal,
Edward B. Fowlkes, Bruce Hoadley published in Journal of the American
Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and
available at [http://www.jstor.org/stable/2290069](http://www.jstor.org/stable/2290069).
On the fourth page of this article, they indicate that the maximum likelihood
estimates of the logistic regression using only temperature are:
$\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard
errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of
fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our
goal is to reproduce the computation behind these values and the Figure 4 of
this article, possibly in a nicer looking way.
# Technical information on the computer on which the analysis is run
We will be using the [Julia](http://julialang.org) language:
```julia
using InteractiveUtils
versioninfo()
```
The computations rely on a number of packages in the Julia ecosystem. The direct
dependencies are summarized hereafter; the complete environment is described in
the [`Manifest.toml`](Manifest.toml) file.
```julia
# Setup environment
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()
# Load dependencies
using HTTP, CSV
using Plots; plotly()
using GLM
using DataFrames
using Printf
include("utils.jl")
# Summary
Pkg.status()
```
# Loading and inspecting data
Let's start by reading data.
```julia
res = HTTP.request(:GET, "https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/raw/master/data/shuttle.csv?inline=false")
data = CSV.read(res.body)
```
We know from our previous experience on this data set that filtering data is a
really bad idea. We will therefore process it as such.
```julia; results="raw"
data.Frequency = data.Malfunction ./ data.Count
plot(xlabel="Temperature [F]", ylabel="Frequency")
plot!(data.Temperature, data.Frequency, seriestype=:scatter, label=nothing)
disp()
```
# Logistic regression
Let's assume O-rings independently fail with the same probability which solely
depends on temperature. A logistic regression should allow us to estimate the
influence of temperature.
```julia; wrap=false; hold=true
model = glm(@formula(Frequency ~ Temperature), data,
Binomial(), LogitLink())
α, β = coef(model)
σα, σβ = stderror(model)
G² = deviance(model)
nDOF = Int(dof_residual(model))
model
```
The maximum likelyhood estimator of the intercept and of Temperature are thus
$\hat{\alpha}=`j @printf "%.3f" α`$ and
$\hat{\beta}=`j @printf "%.3f" β`$.
This corresponds to the values from the article of Dalal et al. The standard
errors are
$s_{\hat{\alpha}} = `j @printf "%.3f" σα`$ and
$s_{\hat{\beta}} = `j @printf "%.3f" σβ`$,
which is different from the $3.052$ and $0.047$ reported by Dallal et al. The
deviance is
$G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom.
I cannot find any value similar to the Goodness of fit ($G^2=18.086$) reported
by Dalal et al. However, the number of degrees of freedom is similar to theirs
(21).
There seems to be something wrong. Oh I know, I haven't indicated that my
observations are actually the result of 6 observations for each rocket
launch. The correct way to do this would be to weight the data using the `Count`
column. Since I don't know how to do that with the
[GLM](https://github.com/JuliaStats/GLM.jl) package I'm using, I will simply
duplicate the data:
```julia; wrap=false; hold=true
weighted_data = DataFrame(Temperature=Int[], Frequency=Float64[])
for row in eachrow(data)
for _ in 1:row.Count
push!(weighted_data, (Temperature=row.Temperature,
Frequency=row.Frequency))
end
end
model = glm(@formula(Frequency ~ Temperature), weighted_data,
Binomial(), LogitLink())
α, β = coef(model)
σα, σβ = stderror(model)
G² = deviance(model)
nDOF = Int(dof_residual(model))
model
```
Good, now I have recovered the asymptotic standard errors
$s_{\hat{\alpha}} = `j @printf "%.3f" σα`$ and
$s_{\hat{\beta}} = `j @printf "%.3f" σβ`$,
The Goodness of fit (Deviance) indicated for this model is
$G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom. Now $G^2$ is in
good accordance to the results of the Dalal *et al.* article, but the number of
degrees of freedom is 6 times larger than i should, due to my tampering of the
data to duplicate them instead of weighting them.
# Predicting failure probability
The temperature when launching the shuttle was 31°F. Let's try to estimate the
failure probability for such temperature using our model:
```julia; results="raw"
prediction = DataFrame(Temperature=30:0.25:90)
prediction.Frequency = predict(model, prediction)
plot(xlabel="Temperature [F]", ylabel="Frequency")
plot!(data.Temperature, data.Frequency, seriestype=:scatter, label="data")
plot!(prediction.Temperature, prediction.Frequency, label="prediction")
disp()
```
This figure is very similar to the Figure 4 of Dalal *et al*.
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- ispell-local-dictionary: "french" -->
<!-- End: -->
<!DOCTYPE html>
<HTML lang = "en">
<HEAD>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
{{#:title}}<title>{{:title}}</title>{{/:title}}
{{{ :header_script }}}
<script src="plotly-latest.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>
<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
{{{ :highlightcss }}}
<style type="text/css">
{{{ :themecss }}}
</style>
</HEAD>
<BODY>
<div class ="container">
<div class = "row">
<div class = "col-md-12 twelve columns">
<div class="title">
{{#:title}}<h1 class="title">{{:title}}</h1>{{/:title}}
{{#:author}}<h5>{{{:author}}}</h5>{{/:author}}
{{#:date}}<h5>{{{:date}}}</h5>{{/:date}}
</div>
{{{ :body }}}
<HR/>
<div class="footer"><p>
Published from <a href="{{{:source}}}">{{{:source}}}</a> using
<a href="http://github.com/mpastell/Weave.jl">Weave.jl</a>
{{:wversion}} on {{:wtime}}.
<p></div>
</div>
</div>
</div>
</BODY>
</HTML>
This diff is collapsed.
This diff is collapsed.
# Affichage des DataFrames dans la sortie HTML
using DataFrames
using Printf
function info(df::DataFrame)
try
WEAVE_ARGS
catch
display(df)
return
end
pretty(x) = x
pretty(x::Float64) = @sprintf "%.6f" x
pretty(::Type{Union{T,Missing}}) where {T} = "$T?"
function printrow(i)
print("<tr>")
print("<th>$i</th>")
for j in 1:ncol(df)
print("<td>$(pretty(df[i,j]))</td>")
end
print("</tr>")
end
function printrows()
i = 1
while i<=nrow(df) && i<=3
printrow(i)
i += 1
end
i>nrow(df) && return
if i < nrow(df)-2
print("<tr><td>...</td></tr>")
end
i = max(i, nrow(df)-2)
while i<=nrow(df)
printrow(i)
i += 1
end
end
println("$(nrow(df)) rows × $(ncol(df)) columns")
print("<table>")
print("<thead>")
print("<tr><th></th>")
for col in names(df)
print("<th>$col</th>")
end
print("</tr>")
print("<tr><th></th>")
for col in names(df)
print("<th>$(pretty(eltype(df[!,col])))</th>")
end
print("</tr>")
print("</thead>")
print("<tbody>")
printrows()
print("</tbody>")
print("</table>\n")
end
# Meilleure intégration entre Plotly et Weave
function disp()
try
WEAVE_ARGS
catch
display(plot!())
return
end
buf = IOBuffer()
show(buf, MIME("text/html"), plot!())
seekstart(buf)
str = String(read(buf))
m = match(r"<body>(.*)</body>"s, str)
return if m === nothing
str
else
m.captures[1]
end |> print
end
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment