"# Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this document we reperform some of the analysis provided in \n",
"*Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069. \n",
"\n",
"On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\\hat{\\alpha}=5.085$ and $\\hat{\\beta}=-0.1156$ and their asymptotic standard errors are $s_{\\hat{\\alpha}}=3.052$ and $s_{\\hat{\\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Technical information on the computer on which the analysis is run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be using the python3 language using the pandas, statsmodels, numpy, matplotlib and seaborn libraries."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:03:24) [GCC 12.3.0]\n",
"uname_result(system='Linux', node='felic-ThinkPad-E14-Gen-2', release='5.15.0-91-generic', version='#101~20.04.1-Ubuntu SMP Thu Nov 16 14:22:28 UTC 2023', machine='x86_64')\n",
"Let's assume O-rings independently fail with the same probability which solely depends on temperature. A logistic regression should allow us to estimate the influence of temperature."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "Calling Family(..) with a link class is not allowed. Use an instance of a link class instead.",
"File \u001b[0;32m~/Programs/miniconda3/envs/rech_repro/lib/python3.12/site-packages/statsmodels/genmod/families/family.py:932\u001b[0m, in \u001b[0;36mBinomial.__init__\u001b[0;34m(self, link, check_link)\u001b[0m\n\u001b[1;32m 929\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mn \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 930\u001b[0m \u001b[38;5;66;03m# overwritten by initialize if needed but always used to initialize\u001b[39;00m\n\u001b[1;32m 931\u001b[0m \u001b[38;5;66;03m# variance since endog is assumed/forced to be (0,1)\u001b[39;00m\n\u001b[0;32m--> 932\u001b[0m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mBinomial\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[1;32m 933\u001b[0m \u001b[43m \u001b[49m\u001b[43mlink\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlink\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 934\u001b[0m \u001b[43m \u001b[49m\u001b[43mvariance\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mV\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mBinomial\u001b[49m\u001b[43m(\u001b[49m\u001b[43mn\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mn\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 935\u001b[0m \u001b[43m \u001b[49m\u001b[43mcheck_link\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcheck_link\u001b[49m\n\u001b[1;32m 936\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/Programs/miniconda3/envs/rech_repro/lib/python3.12/site-packages/statsmodels/genmod/families/family.py:94\u001b[0m, in \u001b[0;36mFamily.__init__\u001b[0;34m(self, link, variance, check_link)\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m inspect\u001b[38;5;241m.\u001b[39misclass(link):\n\u001b[1;32m 90\u001b[0m warnmssg \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m 91\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCalling Family(..) with a link class is not allowed. Use an \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 92\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minstance of a link class instead.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 93\u001b[0m )\n\u001b[0;32m---> 94\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(warnmssg)\n\u001b[1;32m 96\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlink \u001b[38;5;241m=\u001b[39m link\n\u001b[1;32m 97\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mvariance \u001b[38;5;241m=\u001b[39m variance\n",
"\u001b[0;31mTypeError\u001b[0m: Calling Family(..) with a link class is not allowed. Use an instance of a link class instead."
"The maximum likelyhood estimator of the intercept and of Temperature are thus $\\hat{\\alpha}=5.0849$ and $\\hat{\\beta}=-0.1156$. This **corresponds** to the values from the article of Dalal *et al.* The standard errors are $s_{\\hat{\\alpha}} = 7.477$ and $s_{\\hat{\\beta}} = 0.115$, which is **different** from the $3.052$ and $0.04702$ reported by Dallal *et al.* The deviance is $3.01444$ with 21 degrees of freedom. I cannot find any value similar to the Goodness of fit ($G^2=18.086$) reported by Dalal *et al.* There seems to be something wrong. Oh I know, I haven't indicated that my observations are actually the result of 6 observations for each rocket launch. Let's indicate these weights (since the weights are always the same throughout all experiments, it does not change the estimates of the fit but it does influence the variance estimates)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "Calling Family(..) with a link class is not allowed. Use an instance of a link class instead.",
"File \u001b[0;32m~/Programs/miniconda3/envs/rech_repro/lib/python3.12/site-packages/statsmodels/genmod/families/family.py:932\u001b[0m, in \u001b[0;36mBinomial.__init__\u001b[0;34m(self, link, check_link)\u001b[0m\n\u001b[1;32m 929\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mn \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 930\u001b[0m \u001b[38;5;66;03m# overwritten by initialize if needed but always used to initialize\u001b[39;00m\n\u001b[1;32m 931\u001b[0m \u001b[38;5;66;03m# variance since endog is assumed/forced to be (0,1)\u001b[39;00m\n\u001b[0;32m--> 932\u001b[0m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mBinomial\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[1;32m 933\u001b[0m \u001b[43m \u001b[49m\u001b[43mlink\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlink\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 934\u001b[0m \u001b[43m \u001b[49m\u001b[43mvariance\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mV\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mBinomial\u001b[49m\u001b[43m(\u001b[49m\u001b[43mn\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mn\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 935\u001b[0m \u001b[43m \u001b[49m\u001b[43mcheck_link\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcheck_link\u001b[49m\n\u001b[1;32m 936\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/Programs/miniconda3/envs/rech_repro/lib/python3.12/site-packages/statsmodels/genmod/families/family.py:94\u001b[0m, in \u001b[0;36mFamily.__init__\u001b[0;34m(self, link, variance, check_link)\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m inspect\u001b[38;5;241m.\u001b[39misclass(link):\n\u001b[1;32m 90\u001b[0m warnmssg \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m 91\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCalling Family(..) with a link class is not allowed. Use an \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 92\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minstance of a link class instead.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 93\u001b[0m )\n\u001b[0;32m---> 94\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(warnmssg)\n\u001b[1;32m 96\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlink \u001b[38;5;241m=\u001b[39m link\n\u001b[1;32m 97\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mvariance \u001b[38;5;241m=\u001b[39m variance\n",
"\u001b[0;31mTypeError\u001b[0m: Calling Family(..) with a link class is not allowed. Use an instance of a link class instead."
"**I think I have managed to correctly compute and plot the uncertainty of my prediction.** Although the shaded area seems very similar to [the one obtained by with R](https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/tree/master/challenger.pdf), I can spot a few differences (e.g., the blue point for temperature 63 is outside)... Could this be a numerical error ? Or a difference in the statistical method ? It is not clear which one is \"right\"."