Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure¶
+In this document we reperform some of the analysis provided in +Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure by Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley published in Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069.
+On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\hat{\alpha}=5.085$ and $\hat{\beta}=-0.1156$ and their asymptotic standard errors are $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way.
+Technical information on the computer on which the analysis is run¶
+We will be using the python3 language using the pandas, statsmodels, numpy, matplotlib and seaborn libraries.
+def print_imported_modules():
+ import sys
+ for name, val in sorted(sys.modules.items()):
+ if(hasattr(val, '__version__')):
+ print(val.__name__, val.__version__)
+# else:
+# print(val.__name__, "(unknown version)")
+def print_sys_info():
+ import sys
+ import platform
+ print(sys.version)
+ print(platform.uname())
+
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import statsmodels.api as sm
+import seaborn as sns
+
+print_sys_info()
+print_imported_modules()
+3.10.4 (main, Apr 2 2022, 09:04:19) [GCC 11.2.0] +uname_result(system='Linux', node='dev004', release='5.15.0-33-generic', version='#34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022', machine='x86_64') +IPython 8.14.0 +IPython.core.release 8.14.0 +PIL 10.3.0 +PIL.Image 10.3.0 +PIL._deprecate 10.3.0 +PIL._version 10.3.0 +_csv 1.0 +_ctypes 1.1.0 +_curses b'2.2' +decimal 1.70 +_pydev_bundle.fsnotify 0.1.5 +_pydevd_frame_eval.vendored.bytecode 0.13.0.dev +argparse 1.1 +backcall 0.2.0 +cffi 1.15.1 +colorama 0.4.4 +comm 0.1.4 +csv 1.0 +ctypes 1.1.0 +cycler 0.10.0 +dateutil 2.9.0 +dateutil._version 2.9.0 +debugpy 1.6.7.post1 +debugpy.public_api 1.6.7.post1 +decimal 1.70 +decorator 5.1.1 +defusedxml 0.7.1 +distutils 3.10.4 +django 3.2.25 +entrypoints 0.4 +executing 1.2.0 +executing.version 1.2.0 +http.server 0.6 +ipykernel 6.25.1 +ipykernel._version 6.25.1 +ipywidgets 8.1.0 +ipywidgets._version 8.1.0 +jedi 0.19.0 +joblib 1.2.0 +joblib.externals.cloudpickle 2.2.0 +joblib.externals.loky 3.3.0 +json 2.0.9 +jupyter_client 7.4.9 +jupyter_client._version 7.4.9 +jupyter_core 5.3.1 +jupyter_core.version 5.3.1 +kiwisolver 1.4.5 +kiwisolver._cext 1.4.5 +logging 0.5.1.2 +matplotlib 3.7.2 +matplotlib._version 3.7.2 +numpy 1.22.4 +numpy.core 1.22.4 +numpy.core._multiarray_umath 3.1 +numpy.lib 1.22.4 +numpy.linalg._umath_linalg 0.1.5 +numpy.version 1.22.4 +packaging 23.2 +pandas 1.4.2 +parso 0.8.3 +patsy 0.5.3 +patsy.version 0.5.3 +pexpect 4.8.0 +pickleshare 0.7.5 +pkg_resources._vendor.more_itertools 9.1.0 +pkg_resources._vendor.packaging 23.1 +pkg_resources._vendor.platformdirs 2.6.2 +pkg_resources._vendor.platformdirs.version 2.6.2 +pkg_resources._vendor.more_itertools 9.1.0 +pkg_resources._vendor.packaging 23.1 +pkg_resources._vendor.platformdirs 2.6.2 +platform 1.0.8 +platformdirs 3.10.0 +platformdirs.version 3.10.0 +prompt_toolkit 3.0.39 +psutil 5.9.5 +ptyprocess 0.7.0 +pure_eval 0.2.2 +pure_eval.version 0.2.2 +pyarrow 9.0.0 +pyarrow._generated_version 9.0.0 +pydevd 2.9.5 +pygments 2.16.1 +pyparsing 2.4.7 +pytz 2023.3.post1 +re 2.2.1 +scipy 1.7.3 +scipy._lib._uarray 0.5.1+49.g4c3f1d7.scipy +scipy._lib.decorator 4.0.5 +scipy.integrate._dop 1.21.4 +scipy.integrate._ode $Id$ +scipy.integrate._odepack 1.9 +scipy.integrate._quadpack 1.13 +scipy.integrate.lsoda 1.21.4 +scipy.integrate.vode 1.21.4 +scipy.interpolate._fitpack 1.7 +scipy.interpolate.dfitpack 1.21.4 +scipy.linalg._fblas 1.21.4 +scipy.linalg._flapack 1.21.4 +scipy.linalg._flinalg 1.21.4 +scipy.linalg._interpolative 1.21.4 +scipy.ndimage 2.0 +scipy.optimize.__nnls 1.21.4 +scipy.optimize._cobyla 1.21.4 +scipy.optimize._lbfgsb 1.21.4 +scipy.optimize._minpack 1.10 +scipy.optimize._slsqp 1.21.4 +scipy.optimize.minpack2 1.21.4 +scipy.signal.spline 0.2 +scipy.sparse.linalg.eigen.arpack._arpack 1.21.4 +scipy.sparse.linalg.isolve._iterative 1.21.4 +scipy.special.specfun 1.21.4 +scipy.stats.mvn 1.21.4 +scipy.stats.statlib 1.21.4 +seaborn 0.13.2 +seaborn.external.appdirs 1.4.4 +seaborn.external.husl 2.1.0 +setuptools 69.1.1 +distutils 3.10.4 +setuptools._vendor.more_itertools 8.8.0 +setuptools._vendor.ordered_set 3.1 +setuptools._vendor.packaging 23.1 +setuptools._vendor.more_itertools 8.8.0 +setuptools._vendor.ordered_set 3.1 +setuptools._vendor.packaging 23.1 +setuptools.version 69.1.1 +six 1.16.0 +socketserver 0.4 +stack_data 0.6.2 +stack_data.version 0.6.2 +statsmodels 0.14.0 +statsmodels.__init__ 0.14.0 +statsmodels._version 0.14.0 +statsmodels.api 0.14.0 +statsmodels.tools.web 0.14.0 +traitlets 5.9.0 +traitlets._version 5.9.0 +urllib.request 3.10 +wcwidth 0.2.6 +xmlrpc.client 3.10 +zlib 1.0 +zmq 24.0.1 +zmq.sugar 24.0.1 +zmq.sugar.version 24.0.1 ++
Loading and inspecting data¶
Let's start by reading data.
+data = pd.read_csv("./shuttle.csv")
+data
+| + | Date | +Count | +Temperature | +Pressure | +Malfunction | +
|---|---|---|---|---|---|
| 0 | +4/12/81 | +6 | +66 | +50 | +0 | +
| 1 | +11/12/81 | +6 | +70 | +50 | +1 | +
| 2 | +3/22/82 | +6 | +69 | +50 | +0 | +
| 3 | +11/11/82 | +6 | +68 | +50 | +0 | +
| 4 | +4/04/83 | +6 | +67 | +50 | +0 | +
| 5 | +6/18/82 | +6 | +72 | +50 | +0 | +
| 6 | +8/30/83 | +6 | +73 | +100 | +0 | +
| 7 | +11/28/83 | +6 | +70 | +100 | +0 | +
| 8 | +2/03/84 | +6 | +57 | +200 | +1 | +
| 9 | +4/06/84 | +6 | +63 | +200 | +1 | +
| 10 | +8/30/84 | +6 | +70 | +200 | +1 | +
| 11 | +10/05/84 | +6 | +78 | +200 | +0 | +
| 12 | +11/08/84 | +6 | +67 | +200 | +0 | +
| 13 | +1/24/85 | +6 | +53 | +200 | +2 | +
| 14 | +4/12/85 | +6 | +67 | +200 | +0 | +
| 15 | +4/29/85 | +6 | +75 | +200 | +0 | +
| 16 | +6/17/85 | +6 | +70 | +200 | +0 | +
| 17 | +7/2903/85 | +6 | +81 | +200 | +0 | +
| 18 | +8/27/85 | +6 | +76 | +200 | +0 | +
| 19 | +10/03/85 | +6 | +79 | +200 | +0 | +
| 20 | +10/30/85 | +6 | +75 | +200 | +2 | +
| 21 | +11/26/85 | +6 | +76 | +200 | +0 | +
| 22 | +1/12/86 | +6 | +58 | +200 | +1 | +
We know from our previous experience on this data set that filtering data is a really bad idea. We will therefore process it as such.
+%matplotlib inline
+pd.set_option('mode.chained_assignment',None) # this removes a useless warning from pandas
+import matplotlib.pyplot as plt
+
+data["Frequency"]=data.Malfunction/data.Count
+data.plot(x="Temperature",y="Frequency",kind="scatter",ylim=[0,1])
+plt.grid(True)
+/home/mlinger@drakai.local/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:1114: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored + scatter = ax.scatter( ++
data["Intercept"] = 1
+Logistic regression¶
Let's assume O-rings independently fail with the same probability which solely depends on temperature. A logistic regression should allow us to estimate the influence of temperature.
+import statsmodels.api as sm
+
+# Create an instance of the link class
+logit_link = sm.families.links.logit()
+
+# Use the instance of the link class in the Binomial family
+logmodel = sm.GLM(data['Frequency'], data[['Intercept','Temperature']],
+ family=sm.families.Binomial(link=logit_link),
+ var_weights=data['Count']).fit()
+
+logmodel.summary()
+/home/mlinger@drakai.local/.local/lib/python3.10/site-packages/statsmodels/genmod/families/links.py:13: FutureWarning: The logit link alias is deprecated. Use Logit instead. The logit link alias will be removed after the 0.15.0 release. + warnings.warn( ++
| Dep. Variable: | Frequency | No. Observations: | 23 | +
|---|---|---|---|
| Model: | GLM | Df Residuals: | 21 | +
| Model Family: | Binomial | Df Model: | 1 | +
| Link Function: | logit | Scale: | 1.0000 | +
| Method: | IRLS | Log-Likelihood: | -23.526 | +
| Date: | Tue, 09 Jul 2024 | Deviance: | 18.086 | +
| Time: | 15:31:00 | Pearson chi2: | 30.0 | +
| No. Iterations: | 6 | Pseudo R-squ. (CS): | 0.2344 | +
| Covariance Type: | nonrobust | + |
| coef | std err | z | P>|z| | [0.025 | 0.975] | +|
|---|---|---|---|---|---|---|
| Intercept | 5.0850 | 3.052 | 1.666 | 0.096 | -0.898 | 11.068 | +
| Temperature | -0.1156 | 0.047 | -2.458 | 0.014 | -0.208 | -0.023 | +
Good, now I have recovered the asymptotic standard errors $s_{\hat{\alpha}}=3.052$ and $s_{\hat{\beta}}=0.047$. +The Goodness of fit (Deviance) indicated for this model is $G^2=18.086$ with 21 degrees of freedom (Df Residuals).
+I have therefore managed to fully replicate the results of the Dalal et al. article.
+Predicting failure probability¶
The temperature when launching the shuttle was 31°F. Let's try to estimate the failure probability for such temperature using our model.:
+%matplotlib inline
+data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1})
+data_pred['Frequency'] = logmodel.predict(data_pred)
+data_pred.plot(x="Temperature",y="Frequency",kind="line",ylim=[0,1])
+plt.scatter(x=data["Temperature"],y=data["Frequency"])
+plt.grid(True)
+This figure is very similar to the Figure 4 of Dalal et al. I have managed to replicate the Figure 4 of the Dalal et al. article.
+Computing and plotting uncertainty¶
+Following the documentation of Seaborn, I use regplot.
+sns.set(color_codes=True)
+plt.xlim(30,90)
+plt.ylim(0,1)
+sns.regplot(x='Temperature', y='Frequency', data=data, logistic=True)
+plt.show()
+I think I have managed to correctly compute and plot the uncertainty of my prediction. Although the shaded area seems very similar to the one obtained by with R, I can spot a few differences (e.g., the blue point for temperature 63 is outside)... Could this be a numerical error ? Or a difference in the statistical method ? It is not clear which one is "right".
+
+