{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyse du risque de défaillance des joints toriques de la navette Challenger" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le 27 Janvier 1986, veille du décollage de la navette *Challenger*, eu\n", "lieu une télé-conférence de trois heures entre les ingénieurs de la\n", "Morton Thiokol (constructeur d'un des moteurs) et de la NASA. La\n", "discussion portait principalement sur les conséquences de la\n", "température prévue au moment du décollage de 31°F (juste en dessous de\n", "0°C) sur le succès du vol et en particulier sur la performance des\n", "joints toriques utilisés dans les moteurs. En effet, aucun test\n", "n'avait été effectué à cette température.\n", "\n", "L'étude qui suit reprend donc une partie des analyses effectuées cette\n", "nuit là et dont l'objectif était d'évaluer l'influence potentielle de\n", "la température et de la pression à laquelle sont soumis les joints\n", "toriques sur leur probabilité de dysfonctionnement. Pour cela, nous\n", "disposons des résultats des expériences réalisées par les ingénieurs\n", "de la NASA durant les 6 années précédant le lancement de la navette\n", "Challenger.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chargement des données\n", "Nous commençons donc par charger ces données:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateCountTemperaturePressureMalfunction
04/12/81666500
111/12/81670501
23/22/82669500
311/11/82668500
44/04/83667500
56/18/82672500
68/30/836731000
711/28/836701000
82/03/846572001
94/06/846632001
108/30/846702001
1110/05/846782000
1211/08/846672000
131/24/856532002
144/12/856672000
154/29/856752000
166/17/856702000
177/29/856812000
188/27/856762000
1910/03/856792000
2010/30/856752002
2111/26/856762000
221/12/866582001
\n", "
" ], "text/plain": [ " Date Count Temperature Pressure Malfunction\n", "0 4/12/81 6 66 50 0\n", "1 11/12/81 6 70 50 1\n", "2 3/22/82 6 69 50 0\n", "3 11/11/82 6 68 50 0\n", "4 4/04/83 6 67 50 0\n", "5 6/18/82 6 72 50 0\n", "6 8/30/83 6 73 100 0\n", "7 11/28/83 6 70 100 0\n", "8 2/03/84 6 57 200 1\n", "9 4/06/84 6 63 200 1\n", "10 8/30/84 6 70 200 1\n", "11 10/05/84 6 78 200 0\n", "12 11/08/84 6 67 200 0\n", "13 1/24/85 6 53 200 2\n", "14 4/12/85 6 67 200 0\n", "15 4/29/85 6 75 200 0\n", "16 6/17/85 6 70 200 0\n", "17 7/29/85 6 81 200 0\n", "18 8/27/85 6 76 200 0\n", "19 10/03/85 6 79 200 0\n", "20 10/30/85 6 75 200 2\n", "21 11/26/85 6 76 200 0\n", "22 1/12/86 6 58 200 1" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "data = pd.read_csv(\"shuttle.csv\")\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le jeu de données nous indique la date de l'essai, le nombre de joints\n", "toriques mesurés (il y en a 6 sur le lançeur principal), la\n", "température (en Farenheit) et la pression (en psi), et enfin le\n", "nombre de dysfonctionnements relevés. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspection graphique des données\n", "On pourrait penser que les vols où aucun incident n'est relevé n'apportent aucun information\n", "sur l'influence de la température ou de la pression sur les\n", "dysfonctionnements, et se concentrer sur les expériences où au\n", "moins un joint a été défectueux. C'est ce qui est montré dans le tableau ci-dessous.\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateCountTemperaturePressureMalfunction
111/12/81670501
82/03/846572001
94/06/846632001
108/30/846702001
131/24/856532002
2010/30/856752002
221/12/866582001
\n", "
" ], "text/plain": [ " Date Count Temperature Pressure Malfunction\n", "1 11/12/81 6 70 50 1\n", "8 2/03/84 6 57 200 1\n", "9 4/06/84 6 63 200 1\n", "10 8/30/84 6 70 200 1\n", "13 1/24/85 6 53 200 2\n", "20 10/30/85 6 75 200 2\n", "22 1/12/86 6 58 200 1" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data1 = data[data.Malfunction>0]\n", "data1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le tableau indique une variabilité de température importante et\n", "une pression quasiment toujours égale à 200, ce qui devrait\n", "simplifier l'analyse. La température serait-elle plus impactante \n", "sur les dysfonctionnements ? \n", "\n", "En réalité, il est nécessaire, si l'on veut obtenir la dépendance\n", "réelle des dysfonctionnements sur les paramètres environementaux,\n", "de prendre en compte tous les cas expérimentaux, notamment ceux\n", "pour lesquels aucun dysfonctionnement n'apparaît. \n", "C'est d'ailleurs absolument nécessaire si l'on veut pouvoir dire \n", "quelque chose sur la probabilité qu'un dysfonctionnement \n", "apparaisse : sans prendre en compte ces cas expérimentaux, on\n", "quantifie simplement l'influence de la température sur la\n", "probabilité conditionnelle qu'un dysfonctionnement arrive, sachant\n", "qu'au moins un dysfonctionnement est survenu.\n", "\n", "Conservons cependant l'approche en fonction de la température,\n", "mais sur l'ensemble des données expérimentales. \n", "Comment la fréquence d'échecs varie-t-elle avec la température ?\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "pd.set_option('mode.chained_assignment',None) # this removes a useless warning from pandas\n", "import matplotlib.pyplot as plt\n", "\n", "data[\"Frequency\"]=data.Malfunction/data.Count\n", "data.plot(x=\"Temperature\",y=\"Frequency\",kind=\"scatter\",ylim=[0,1])\n", "plt.grid(True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "À première vue, ce n'est pas flagrant mais bon, essayons quand même\n", "d'estimer l'impact de la température $t$ sur la probabilité de\n", "dysfonctionnements d'un joint. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimation de l'influence de la température\n", "\n", "Supposons que chacun des 6 joints toriques est endommagé avec la même\n", "probabilité et indépendamment des autres et que cette probabilité ne\n", "dépend que de la température. Si on note $p(t)$ cette probabilité, le\n", "nombre de joints $D$ dysfonctionnant lorsque l'on effectue le vol à\n", "température $t$ suit une loi binomiale de paramètre $n=6$ et\n", "$p=p(t)$. Pour relier $p(t)$ à $t$, on va donc effectuer une\n", "régression logistique." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Generalized Linear Model Regression Results
Dep. Variable: Frequency No. Observations: 23
Model: GLM Df Residuals: 21
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -3.9210
Date: Sun, 28 Jan 2024 Deviance: 3.0144
Time: 17:39:06 Pearson chi2: 5.00
No. Iterations: 6 Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
Intercept 5.0850 7.477 0.680 0.496 -9.570 19.740
Temperature -0.1156 0.115 -1.004 0.316 -0.341 0.110
" ], "text/plain": [ "\n", "\"\"\"\n", " Generalized Linear Model Regression Results \n", "==============================================================================\n", "Dep. Variable: Frequency No. Observations: 23\n", "Model: GLM Df Residuals: 21\n", "Model Family: Binomial Df Model: 1\n", "Link Function: logit Scale: 1.0000\n", "Method: IRLS Log-Likelihood: -3.9210\n", "Date: Sun, 28 Jan 2024 Deviance: 3.0144\n", "Time: 17:39:06 Pearson chi2: 5.00\n", "No. Iterations: 6 Covariance Type: nonrobust\n", "===============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "-------------------------------------------------------------------------------\n", "Intercept 5.0850 7.477 0.680 0.496 -9.570 19.740\n", "Temperature -0.1156 0.115 -1.004 0.316 -0.341 0.110\n", "===============================================================================\n", "\"\"\"" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statsmodels.api as sm\n", "\n", "data[\"Success\"]=data.Count-data.Malfunction\n", "data[\"Intercept\"]=1\n", "\n", "logmodelT=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], family=sm.families.Binomial(sm.families.links.logit)).fit()\n", "\n", "logmodelT.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "L'estimateur le plus probable du paramètre de température est -0.1156\n", "et l'erreur standard de cet estimateur est de 0.115. La température a\n", "un impact conséquent sur l'apparition de dysfonctionnements.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimation de l'influence de la pression\n", "\n", "\n", "Vérifions maintenant l'influence de la pression. \n", "\n", "Supposons que chacun des 6 joints toriques est endommagé avec la même\n", "probabilité et indépendamment des autres et que cette probabilité ne\n", "dépend que de la pression. Si on note $p(p)$ cette probabilité, le\n", "nombre de joints $D$ dysfonctionnant lorsque l'on effectue le vol à\n", "pression $p$ suit une loi binomiale de paramètre $n=6$ et\n", "$p=p(p)$. Pour relier $p(p)$ à $p$, on va donc effectuer une\n", "régression logistique." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Generalized Linear Model Regression Results
Dep. Variable: Frequency No. Observations: 23
Model: GLM Df Residuals: 21
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -4.2246
Date: Sun, 28 Jan 2024 Deviance: 3.6216
Time: 17:39:09 Pearson chi2: 3.94
No. Iterations: 6 Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
Intercept -4.3835 3.487 -1.257 0.209 -11.219 2.452
Pressure 0.0102 0.019 0.549 0.583 -0.026 0.047
" ], "text/plain": [ "\n", "\"\"\"\n", " Generalized Linear Model Regression Results \n", "==============================================================================\n", "Dep. Variable: Frequency No. Observations: 23\n", "Model: GLM Df Residuals: 21\n", "Model Family: Binomial Df Model: 1\n", "Link Function: logit Scale: 1.0000\n", "Method: IRLS Log-Likelihood: -4.2246\n", "Date: Sun, 28 Jan 2024 Deviance: 3.6216\n", "Time: 17:39:09 Pearson chi2: 3.94\n", "No. Iterations: 6 Covariance Type: nonrobust\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -4.3835 3.487 -1.257 0.209 -11.219 2.452\n", "Pressure 0.0102 0.019 0.549 0.583 -0.026 0.047\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logmodelP=sm.GLM(data['Frequency'], data[['Intercept','Pressure']], family=sm.families.Binomial(sm.families.links.logit)).fit()\n", "\n", "logmodelP.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "L'estimateur le plus probable du paramètre de pression est 0.0102, ce qui est très faible, et l'erreur standard de cet estimateur est de 0.019. La pression semble avoir peu d'impact sur l'apparition de dysfonctionnements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Influences conjointes de la temperature et la pression\n", "\n", "Vérifions les faits qui semblent ressortir de nos premières analyses, dans lesquelles nous n'avons fait varier qu'un seul paramètre à la fois. Nous effectuons maintenant une regression logistique sur les deux paramètres à la fois : température et pression, en utilisant toujours les hypothèses de loi binomiale identique pour tous les joints à une pression et un température donnée." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Generalized Linear Model Regression Results
Dep. Variable: Frequency No. Observations: 23
Model: GLM Df Residuals: 20
Model Family: Binomial Df Model: 2
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -3.7926
Date: Sun, 28 Jan 2024 Deviance: 2.7576
Time: 17:46:06 Pearson chi2: 4.19
No. Iterations: 6 Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
Intercept 2.5202 8.541 0.295 0.768 -14.220 19.260
Pressure 0.0085 0.019 0.451 0.652 -0.028 0.045
Temperature -0.0983 0.110 -0.894 0.371 -0.314 0.117
" ], "text/plain": [ "\n", "\"\"\"\n", " Generalized Linear Model Regression Results \n", "==============================================================================\n", "Dep. Variable: Frequency No. Observations: 23\n", "Model: GLM Df Residuals: 20\n", "Model Family: Binomial Df Model: 2\n", "Link Function: logit Scale: 1.0000\n", "Method: IRLS Log-Likelihood: -3.7926\n", "Date: Sun, 28 Jan 2024 Deviance: 2.7576\n", "Time: 17:46:06 Pearson chi2: 4.19\n", "No. Iterations: 6 Covariance Type: nonrobust\n", "===============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "-------------------------------------------------------------------------------\n", "Intercept 2.5202 8.541 0.295 0.768 -14.220 19.260\n", "Pressure 0.0085 0.019 0.451 0.652 -0.028 0.045\n", "Temperature -0.0983 0.110 -0.894 0.371 -0.314 0.117\n", "===============================================================================\n", "\"\"\"" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logmodelPT=sm.GLM(data['Frequency'], data[['Intercept','Pressure', 'Temperature']], family=sm.families.Binomial(sm.families.links.logit)).fit()\n", "\n", "logmodelPT.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "L'estimateur le plus probable pour la température est -0.0983 et celui pour la pression est 0.0085, avec des erreurs standart respectives de 0.110 et 0.019. La température semble donc avoir un impact substanciellement plus important que la pression sur l'apparition de dysfonctionnements.\n", "\n", "Il est donc raisonnable de considérer que seule la température influence le fonctionnement des joints, ce que nous considérerons pour l'estimation de la probabilité de défaillance durant le vol (ne connaissant pas la pression ce jour-là)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimation de la probabilité de dysfonctionnant des joints toriques\n", "La température prévue le jour du décollage est de 31°F. Essayons\n", "d'estimer la probabilité de dysfonctionnement des joints toriques à\n", "cette température à partir du modèle que nous venons de construire:\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1})\n", "data_pred['Frequency'] = logmodelT.predict(data_pred[['Intercept','Temperature']])\n", "data_pred.plot(x=\"Temperature\",y=\"Frequency\",kind=\"line\",ylim=[0,1])\n", "plt.scatter(x=data[\"Temperature\"],y=data[\"Frequency\"])\n", "plt.grid(True)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "scrolled": true }, "source": [ " Elle sera d'environ 0.8, ce qui est très important. Même \n", " en prenant en compte le fait que chaque joint est pairé \n", " avec un autre, la probabilité de défaillance d'une paire \n", " est $p^2 = 0.64$. La probabilité de défaillance d'un des\n", "lançeur est donc de $1-(1-p^2)^3 \\approx 0.95%$. La navette \n", "a toutes les chances d'exploser !!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vérifions maintenant la dépendance du résultat sur la température à partir de notre loi jointe en température et pression : " ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "for pressure in [50, 100, 200]:\n", " data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Pressure':pressure, 'Intercept': 1})\n", " data_pred['Frequency'] = logmodelPT.predict(data_pred[['Intercept', 'Pressure', 'Temperature']])\n", " data_pred.plot(x=\"Temperature\",y=\"Frequency\",kind=\"line\",ylim=[0,1])\n", " plt.scatter(x=data[\"Temperature\"],y=data[\"Frequency\"], label=\"pressure={:.0f}\".format(pressure))\n", " plt.grid(True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La pression influence fortement le résultat. Cependant, pour toutes les pressions prises en compte, la probabilité de dysfonctionnememt d'un joint est supérieur à 0.5, donc la probabilité de défaillance de la navette est supérieure à 0.58. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le lendemain, la navette Challenger explosera et emportera\n", "avec elle ses sept membres d'équipages. L'opinion publique est\n", "fortement touchée et lors de l'enquête qui suivra, la fiabilité des\n", "joints toriques sera directement mise en cause. Les problèmes\n", "de communication interne à la NASA sont pour beaucoup dans ce\n", "fiasco. Le calcul effectué ci-dessous montre que tout indiquait déjà \n", "une sombre fin à cette célèbre histoire. Cependant, est-il tout à \n", "fait exacte. Je laisse le lecteur attentif s'en assurer, car mes\n", "souvenirs de probabilité ne sont plus tous jeunes..." ] } ], "metadata": { "celltoolbar": "Hide code", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }