Exercise 5.

daa4fe3b · Tommy Rushton · b985c45a · b985c45a · daa4fe3b · daa4fe3b
Commit daa4fe3b authored Apr 11, 2024 by Tommy Rushton
4 changed files
--- a/module2/exo1/cosxsx.png
+++ b/module2/exo1/cosxsx.png
--- a/module2/exo5/exo5_python_en.org
+++ b/module2/exo5/exo5_python_en.org
@@ -33,7 +33,7 @@ Challenger.
 * Loading the data
 We start by loading this data:
-#+begin_src python :results value :session *python* :exports both
+#+begin_src python :results value :session :exports both
 import numpy as np
 import pandas as pd
 data = pd.read_csv("shuttle.csv")
@@ -78,8 +78,13 @@ Flights without incidents do not provide any information
 on the influence of temperature or pressure on malfunction.
 We thus focus on the experiments in which at least one O-ring was defective.
-#+begin_src python :results value :session *python* :exports both
+#+begin_quote
-data = data[data.Malfunction>0]
+This is suspect. What if launches without defects were predominately
+at higher temperatures?
+#+end_quote
+#+begin_src python :results value :session :exports both
+data = data[data.Malfunction > 0]
 data
 #+end_src
@@ -97,8 +102,12 @@ We have a high temperature variability but
 the pressure is almost always 200, which should
 simplify the analysis.
+#+begin_quote
+"Almost always" is alarming.
+#+end_quote
 How does the frequency of failure vary with temperature?
-#+begin_src python :results output file :var matplot_lib_filename="freq_temp_python.png" :exports both :session *python*
+#+begin_src python :results output file :var matplot_lib_filename="freq_temp_python.png" :exports both :session
 import matplotlib.pyplot as plt
 plt.clf()
@@ -120,13 +129,17 @@ estimate the impact of temperature $t$ on the probability of O-ring malfunction.
 Suppose that each of the six O-rings is damaged with the same
 probability and independently of the others and that this probability
-depends only on the temperature. If $p(t)$ is this probability, the
+depends only on the temperature.
+#+begin_quote
+OK, yes, let's suppose that.
+#+end_quote
+If $p(t)$ is this probability, the
 number $D$ of malfunctioning O-rings during a flight at
 temperature $t$ follows a binomial law with parameters $n=6$ and
 $p=p(t)$. To link $p(t)$ to $t$, we will therefore perform a
 logistic regression.
-#+begin_src python :results value :session *python* :exports both
+#+begin_src python :results value :session :exports both
 import statsmodels.api as sm
 data["Success"]=data.Count-data.Malfunction
@@ -134,7 +147,8 @@ data["Intercept"]=1
 # logit_model=sm.Logit(data["Frequency"],data[["Intercept","Temperature"]]).fit()
-logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], family=sm.families.Binomial(sm.families.links.logit)).fit()
+link = sm.families.links.Logit()
+logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], family=sm.families.Binomial(link)).fit()
 logmodel.summary()
 #+end_src
@@ -142,15 +156,16 @@ logmodel.summary()
 #+RESULTS:
 #+begin_example
                  Generalized Linear Model Regression Results                  
-==============================================================================
+===============================================================================
 Dep. Variable:               Frequency   No. Observations:                    7
 Model:                             GLM   Df Residuals:                        5
 Model Family:                 Binomial   Df Model:                            1
-Link Function:                  logit   Scale:                             1.0
+Link Function:                   Logit   Scale:                          1.0000
-Method:                          IRLS   Log-Likelihood:                -3.6370
+Method:                           IRLS   Log-Likelihood:                -2.5250
-Date:                Fri, 20 Jul 2018   Deviance:                       3.3763
+Date:              jeu., 11 avril 2024   Deviance:                      0.22231
-Time:                        16:56:08   Pearson chi2:                    0.236
+Time:                         15:05:37   Pearson chi2:                    0.236
-No. Iterations:                     5                                         
+No. Iterations:                      4   Pseudo R-squ. (CS):          1.926e-05
+Covariance Type:             nonrobust                                         
 ===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
 -------------------------------------------------------------------------------
@@ -163,13 +178,23 @@ The most likely estimator of the temperature parameter is 0.0014
 and the standard error of this estimator is 0.122, in other words we
 cannot distinguish any particular impact and we must take our
 estimates with caution.
+#+begin_quote
+Indeed... and look at that /p value/ (0.991) which more-or-less says,
+for the subset of data, that temperature likely has /no effect/ on
+likelihood of malfunction.
+#+end_quote
 * Estimation of the probability of O-ring malfunction
-The expected temperature on the take-off day is 31°F. Let's try to
+The expected temperature on the take-off day is 31°F.
+#+begin_quote
+A temperature at/around which we have /no data/. Extrapolating from
+higher temperatures — bad idea.
+#+end_quote
+Let's try to
 estimate the probability of O-ring malfunction at
 this temperature from the model we just built:
-#+begin_src python :results output file :var matplot_lib_filename="proba_estimate_python.png" :exports both :session *python* 
+#+begin_src python :results output file :var matplot_lib_filename="proba_estimate_python.png" :exports both :session
 import matplotlib.pyplot as plt
 data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1})
@@ -188,9 +213,14 @@ print(matplot_lib_filename)
 As expected from the initial data, the
 temperature has no significant impact on the probability of failure of the
 O-rings. It will be about 0.2, as in the tests
-where we had a failure of at least one joint. Let's get back to the initial dataset to estimate the probability of failure:
+where we had a failure of at least one joint.
+#+begin_quote
-#+begin_src python :results output :session *python* :exports both
+Opting not to exclude the entries where no malfunction occurred
+reveals a strikingly different picture.
+#+end_quote
+Let's get back to the initial dataset to estimate the probability of failure:
+#+begin_src python :results output :session :exports both
 data = pd.read_csv("shuttle.csv")
 print(np.sum(data.Malfunction)/np.sum(data.Count))
 #+end_src
@@ -198,7 +228,13 @@ print(np.sum(data.Malfunction)/np.sum(data.Count))
 #+RESULTS:
 : 0.06521739130434782
-This probability is thus about $p=0.065$. Knowing that there is
+This probability is thus about $p=0.065$.
+#+begin_quote
+This has an air of desperation about it. So now, we're just taking the
+proportion of failures to total O-rings... across just the flights
+where malfunctions were recorded?
+#+end_quote
+Knowing that there is
 a primary and a secondary O-ring on each of the three parts of the
 launcher, the probability of failure of both joints of a launcher
 is $p^2 \approx 0.00425$. The probability of failure of any one of the

--- a/module2/exo5/freq_temp_python.png
+++ b/module2/exo5/freq_temp_python.png
--- a/module2/exo5/proba_estimate_python.png
+++ b/module2/exo5/proba_estimate_python.png