Commit daa4fe3b authored by Tommy Rushton's avatar Tommy Rushton

Exercise 5.

parent b985c45a
module2/exo1/cosxsx.png

83.5 KB | W: | H:

module2/exo1/cosxsx.png

21.9 KB | W: | H:

module2/exo1/cosxsx.png
module2/exo1/cosxsx.png
module2/exo1/cosxsx.png
module2/exo1/cosxsx.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -33,7 +33,7 @@ Challenger. ...@@ -33,7 +33,7 @@ Challenger.
* Loading the data * Loading the data
We start by loading this data: We start by loading this data:
#+begin_src python :results value :session *python* :exports both #+begin_src python :results value :session :exports both
import numpy as np import numpy as np
import pandas as pd import pandas as pd
data = pd.read_csv("shuttle.csv") data = pd.read_csv("shuttle.csv")
...@@ -78,8 +78,13 @@ Flights without incidents do not provide any information ...@@ -78,8 +78,13 @@ Flights without incidents do not provide any information
on the influence of temperature or pressure on malfunction. on the influence of temperature or pressure on malfunction.
We thus focus on the experiments in which at least one O-ring was defective. We thus focus on the experiments in which at least one O-ring was defective.
#+begin_src python :results value :session *python* :exports both #+begin_quote
data = data[data.Malfunction>0] This is suspect. What if launches without defects were predominately
at higher temperatures?
#+end_quote
#+begin_src python :results value :session :exports both
data = data[data.Malfunction > 0]
data data
#+end_src #+end_src
...@@ -97,8 +102,12 @@ We have a high temperature variability but ...@@ -97,8 +102,12 @@ We have a high temperature variability but
the pressure is almost always 200, which should the pressure is almost always 200, which should
simplify the analysis. simplify the analysis.
#+begin_quote
"Almost always" is alarming.
#+end_quote
How does the frequency of failure vary with temperature? How does the frequency of failure vary with temperature?
#+begin_src python :results output file :var matplot_lib_filename="freq_temp_python.png" :exports both :session *python* #+begin_src python :results output file :var matplot_lib_filename="freq_temp_python.png" :exports both :session
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
plt.clf() plt.clf()
...@@ -120,13 +129,17 @@ estimate the impact of temperature $t$ on the probability of O-ring malfunction. ...@@ -120,13 +129,17 @@ estimate the impact of temperature $t$ on the probability of O-ring malfunction.
Suppose that each of the six O-rings is damaged with the same Suppose that each of the six O-rings is damaged with the same
probability and independently of the others and that this probability probability and independently of the others and that this probability
depends only on the temperature. If $p(t)$ is this probability, the depends only on the temperature.
#+begin_quote
OK, yes, let's suppose that.
#+end_quote
If $p(t)$ is this probability, the
number $D$ of malfunctioning O-rings during a flight at number $D$ of malfunctioning O-rings during a flight at
temperature $t$ follows a binomial law with parameters $n=6$ and temperature $t$ follows a binomial law with parameters $n=6$ and
$p=p(t)$. To link $p(t)$ to $t$, we will therefore perform a $p=p(t)$. To link $p(t)$ to $t$, we will therefore perform a
logistic regression. logistic regression.
#+begin_src python :results value :session *python* :exports both #+begin_src python :results value :session :exports both
import statsmodels.api as sm import statsmodels.api as sm
data["Success"]=data.Count-data.Malfunction data["Success"]=data.Count-data.Malfunction
...@@ -134,7 +147,8 @@ data["Intercept"]=1 ...@@ -134,7 +147,8 @@ data["Intercept"]=1
# logit_model=sm.Logit(data["Frequency"],data[["Intercept","Temperature"]]).fit() # logit_model=sm.Logit(data["Frequency"],data[["Intercept","Temperature"]]).fit()
logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], family=sm.families.Binomial(sm.families.links.logit)).fit() link = sm.families.links.Logit()
logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], family=sm.families.Binomial(link)).fit()
logmodel.summary() logmodel.summary()
#+end_src #+end_src
...@@ -142,15 +156,16 @@ logmodel.summary() ...@@ -142,15 +156,16 @@ logmodel.summary()
#+RESULTS: #+RESULTS:
#+begin_example #+begin_example
Generalized Linear Model Regression Results Generalized Linear Model Regression Results
============================================================================== ===============================================================================
Dep. Variable: Frequency No. Observations: 7 Dep. Variable: Frequency No. Observations: 7
Model: GLM Df Residuals: 5 Model: GLM Df Residuals: 5
Model Family: Binomial Df Model: 1 Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0 Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -3.6370 Method: IRLS Log-Likelihood: -2.5250
Date: Fri, 20 Jul 2018 Deviance: 3.3763 Date: jeu., 11 avril 2024 Deviance: 0.22231
Time: 16:56:08 Pearson chi2: 0.236 Time: 15:05:37 Pearson chi2: 0.236
No. Iterations: 5 No. Iterations: 4 Pseudo R-squ. (CS): 1.926e-05
Covariance Type: nonrobust
=============================================================================== ===============================================================================
coef std err z P>|z| [0.025 0.975] coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------- -------------------------------------------------------------------------------
...@@ -163,13 +178,23 @@ The most likely estimator of the temperature parameter is 0.0014 ...@@ -163,13 +178,23 @@ The most likely estimator of the temperature parameter is 0.0014
and the standard error of this estimator is 0.122, in other words we and the standard error of this estimator is 0.122, in other words we
cannot distinguish any particular impact and we must take our cannot distinguish any particular impact and we must take our
estimates with caution. estimates with caution.
#+begin_quote
Indeed... and look at that /p value/ (0.991) which more-or-less says,
for the subset of data, that temperature likely has /no effect/ on
likelihood of malfunction.
#+end_quote
* Estimation of the probability of O-ring malfunction * Estimation of the probability of O-ring malfunction
The expected temperature on the take-off day is 31°F. Let's try to The expected temperature on the take-off day is 31°F.
#+begin_quote
A temperature at/around which we have /no data/. Extrapolating from
higher temperatures — bad idea.
#+end_quote
Let's try to
estimate the probability of O-ring malfunction at estimate the probability of O-ring malfunction at
this temperature from the model we just built: this temperature from the model we just built:
#+begin_src python :results output file :var matplot_lib_filename="proba_estimate_python.png" :exports both :session *python* #+begin_src python :results output file :var matplot_lib_filename="proba_estimate_python.png" :exports both :session
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1}) data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1})
...@@ -188,9 +213,14 @@ print(matplot_lib_filename) ...@@ -188,9 +213,14 @@ print(matplot_lib_filename)
As expected from the initial data, the As expected from the initial data, the
temperature has no significant impact on the probability of failure of the temperature has no significant impact on the probability of failure of the
O-rings. It will be about 0.2, as in the tests O-rings. It will be about 0.2, as in the tests
where we had a failure of at least one joint. Let's get back to the initial dataset to estimate the probability of failure: where we had a failure of at least one joint.
#+begin_quote
#+begin_src python :results output :session *python* :exports both Opting not to exclude the entries where no malfunction occurred
reveals a strikingly different picture.
#+end_quote
Let's get back to the initial dataset to estimate the probability of failure:
#+begin_src python :results output :session :exports both
data = pd.read_csv("shuttle.csv") data = pd.read_csv("shuttle.csv")
print(np.sum(data.Malfunction)/np.sum(data.Count)) print(np.sum(data.Malfunction)/np.sum(data.Count))
#+end_src #+end_src
...@@ -198,7 +228,13 @@ print(np.sum(data.Malfunction)/np.sum(data.Count)) ...@@ -198,7 +228,13 @@ print(np.sum(data.Malfunction)/np.sum(data.Count))
#+RESULTS: #+RESULTS:
: 0.06521739130434782 : 0.06521739130434782
This probability is thus about $p=0.065$. Knowing that there is This probability is thus about $p=0.065$.
#+begin_quote
This has an air of desperation about it. So now, we're just taking the
proportion of failures to total O-rings... across just the flights
where malfunctions were recorded?
#+end_quote
Knowing that there is
a primary and a secondary O-ring on each of the three parts of the a primary and a secondary O-ring on each of the three parts of the
launcher, the probability of failure of both joints of a launcher launcher, the probability of failure of both joints of a launcher
is $p^2 \approx 0.00425$. The probability of failure of any one of the is $p^2 \approx 0.00425$. The probability of failure of any one of the
......
module2/exo5/freq_temp_python.png

12.3 KB | W: | H:

module2/exo5/freq_temp_python.png

11.9 KB | W: | H:

module2/exo5/freq_temp_python.png
module2/exo5/freq_temp_python.png
module2/exo5/freq_temp_python.png
module2/exo5/freq_temp_python.png
  • 2-up
  • Swipe
  • Onion skin
module2/exo5/proba_estimate_python.png

14.3 KB | W: | H:

module2/exo5/proba_estimate_python.png

14.1 KB | W: | H:

module2/exo5/proba_estimate_python.png
module2/exo5/proba_estimate_python.png
module2/exo5/proba_estimate_python.png
module2/exo5/proba_estimate_python.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment