ex5

e73b08b7 · Wojciech Łoboda · d21d5e81 · e73b08b7
Commit e73b08b7 authored Oct 15, 2025 by Wojciech Łoboda
Hide whitespace changes
Inline Side-by-side

Showing with 103 additions and 57 deletions

exo5_en.Rmd module2/exo5/exo5_en.Rmd +103 -57

No files found.
--- a/module2/exo5/exo5_en.Rmd
+++ b/module2/exo5/exo5_en.Rmd
@@ -3,26 +3,28 @@ title: "Analysis of the risk of failure of the O-rings on the Challenger shuttle
 author: "Arnaud Legrand"
 date: "28 juin 2018"
 output: html_document
+editor_options: 
+  markdown: 
+    wrap: 72
 ---
-On January 27, 1986, the day before the takeoff of the shuttle _Challenger_, had
+On January 27, 1986, the day before the takeoff of the shuttle
-a three-hour teleconference was held between 
+*Challenger*, had a three-hour teleconference was held between Morton
-Morton Thiokol (the manufacturer of one of the engines) and NASA. The
+Thiokol (the manufacturer of one of the engines) and NASA. The
-discussion focused on the consequences of the
+discussion focused on the consequences of the temperature at take-off of
-temperature at take-off of 31°F (just below
+31°F (just below 0°C) for the success of the flight and in particular on
-0°C) for the success of the flight and in particular on the performance of the
+the performance of the O-rings used in the engines. Indeed, no test had
-O-rings used in the engines. Indeed, no test
+been performed at this temperature.
-had been performed at this temperature.
+The following study takes up some of the analyses carried out that night
-The following study takes up some of the analyses carried out that
+with the objective of assessing the potential influence of the
-night with the objective of assessing the potential influence of
+temperature and pressure to which the O-rings are subjected on their
-the temperature and pressure to which the O-rings are subjected
+probability of malfunction. Our starting point is the results of the
-on their probability of malfunction. Our starting point is 
+experiments carried out by NASA engineers during the six years preceding
-the results of the experiments carried out by NASA engineers
+the launch of the shuttle Challenger.
-during the six years preceding the launch of the shuttle
-Challenger.
 # Loading the data
 We start by loading this data:
 ```{r}
@@ -31,41 +33,41 @@ data
 ```
 The data set shows us the date of each test, the number of O-rings
-(there are 6 on the main launcher), the
+(there are 6 on the main launcher), the temperature (in Fahrenheit) and
-temperature (in Fahrenheit) and pressure (in psi), and finally the
+pressure (in psi), and finally the number of identified malfunctions.
-number of identified malfunctions.
 # Graphical inspection
-Flights without incidents do not provide any information
-on the influence of temperature or pressure on malfunction.
+Flights without incidents do not provide any information on the
-We thus focus on the experiments in which at least one O-ring was defective.
+influence of temperature or pressure on malfunction. We thus focus on
+the experiments in which at least one O-ring was defective.
 ```{r}
 data = data[data$Malfunction>0,]
 data
 ```
-We have a high temperature variability but
+We have a high temperature variability but the pressure is almost always
-the pressure is almost always 200, which should
+200, which should simplify the analysis.
-simplify the analysis.
 How does the frequency of failure vary with temperature?
 ```{r}
 plot(data=data, Malfunction/Count ~ Temperature, ylim=c(0,1))
 ```
-At first glance, the dependence does not look very important, but let's try to
+At first glance, the dependence does not look very important, but let's
-estimate the impact of temperature $t$ on the probability of O-ring malfunction.
+try to estimate the impact of temperature $t$ on the probability of
+O-ring malfunction.
 # Estimation of the temperature influence
 Suppose that each of the six O-rings is damaged with the same
 probability and independently of the others and that this probability
 depends only on the temperature. If $p(t)$ is this probability, the
-number $D$ of malfunctioning O-rings during a flight at
+number $D$ of malfunctioning O-rings during a flight at temperature $t$
-temperature $t$ follows a binomial law with parameters $n=6$ and
+follows a binomial law with parameters $n=6$ and $p=p(t)$. To link
-$p=p(t)$. To link $p(t)$ to $t$, we will therefore perform a
+$p(t)$ to $t$, we will therefore perform a logistic regression.
-logistic regression.
 ```{r}
 logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count, 
@@ -73,15 +75,16 @@ logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count,
 summary(logistic_reg)
 ```
-The most likely estimator of the temperature parameter is 0.001416
+The most likely estimator of the temperature parameter is 0.001416 and
-and the standard error of this estimator is 0.049, in other words we
+the standard error of this estimator is 0.049, in other words we cannot
-cannot distinguish any particular impact and we must take our
+distinguish any particular impact and we must take our estimates with
-estimates with caution.
+caution.
 # Estimation of the probability of O-ring malfunction
 The expected temperature on the take-off day is 31°F. Let's try to
-estimate the probability of O-ring malfunction at
+estimate the probability of O-ring malfunction at this temperature from
-this temperature from the model we just built:
+the model we just built:
 ```{r}
 # shuttle=shuttle[shuttle$r!=0,] 
@@ -91,29 +94,72 @@ plot(tempv,rmv,type="l",ylim=c(0,1))
 points(data=data, Malfunction/Count ~ Temperature)
 ```
-As expected from the initial data, the
+As expected from the initial data, the temperature has no significant
-temperature has no significant impact on the probability of failure of the
+impact on the probability of failure of the O-rings. It will be about
-O-rings. It will be about 0.2, as in the tests
+0.2, as in the tests where we had a failure of at least one joint. Let's
-where we had a failure of at least one joint. Let's get back to the initial dataset to estimate the probability of failure:
+get back to the initial dataset to estimate the probability of failure:
 ```{r}
 data_full = read.csv("shuttle.csv",header=T)
 sum(data_full$Malfunction)/sum(data_full$Count)
 ```
-This probability is thus about $p=0.065$. Knowing that there is
+This probability is thus about $p=0.065$. Knowing that there is a
-a primary and a secondary O-ring on each of the three parts of the
+primary and a secondary O-ring on each of the three parts of the
-launcher, the probability of failure of both joints of a launcher
+launcher, the probability of failure of both joints of a launcher is
-is $p^2 \approx 0.00425$. The probability of failure of any one of the
+$p^2 \approx 0.00425$. The probability of failure of any one of the
-launchers is $1-(1-p^2)^3 \approx 1.2%$.  That would really be
+launchers is $1-(1-p^2)^3 \approx 1.2%$. That would really be bad
-bad luck.... Everything is under control, so the takeoff can happen
+luck.... Everything is under control, so the takeoff can happen tomorrow
-tomorrow as planned.
+as planned.
-But the next day, the Challenger shuttle exploded and took away
+But the next day, the Challenger shuttle exploded and took away with her
-with her the seven crew members. The public was shocked and in
+the seven crew members. The public was shocked and in the subsequent
-the subsequent investigation, the reliability of the
+investigation, the reliability of the O-rings was questioned. Beyond the
-O-rings was questioned. Beyond the internal communication problems
+internal communication problems of NASA, which have a lot to do with
-of NASA, which have a lot to do with this fiasco, the previous analysis
+this fiasco, the previous analysis includes (at least) a small
-includes (at least) a small problem.... Can you find it?
+problem.... Can you find it? You are free to modify this analysis and to
-You are free to modify this analysis and to look at this dataset
+look at this dataset from all angles in order to to explain what's
-from all angles in order to to explain what's wrong.
+wrong.
+## Finding error
+in the provided data from tests, the range of temperatures is small, all
+of them vary between 60-70, based on this data we cannot reason about
+what will happen in the temperatures around 30. To show this we can
+visualize confidence intervals for out logistic regression
+```{r}
+# Create a sequence of Temperature values for plotting
+newdata <- data.frame(Temperature = seq(0,
+                                        max(data$Temperature),
+                                        length.out = 100))
+# Predict on the link (logit) scale with standard errors
+pred <- predict(logistic_reg, newdata, type = "link", se.fit = TRUE)
+# Compute 95% CI on the link scale
+newdata$fit <- pred$fit
+newdata$lower <- pred$fit - 1.96 * pred$se.fit
+newdata$upper <- pred$fit + 1.96 * pred$se.fit
+# Transform back to probability scale
+newdata$fit_prob <- plogis(newdata$fit)
+newdata$lower_prob <- plogis(newdata$lower)
+newdata$upper_prob <- plogis(newdata$upper)
+```
+```{r}
+library(ggplot2)
+ggplot(newdata, aes(x = Temperature, y = fit_prob)) +
+  geom_line(color = "blue") +                        # Predicted probability line
+  geom_ribbon(aes(ymin = lower_prob, ymax = upper_prob), alpha = 0.2) + # 95% CI
+  geom_point(data = data, aes(x = Temperature, y = Malfunction/Count), color = "red") + # observed proportions
+  labs(y = "Probability of Malfunction",
+       x = "Temperature") +
+  theme_minimal()
+```
+Model was to simple and took into account only, temperature not pressure
+etc, we should do the same based on pressure, temperature, malfucntion
+types