From e73b08b76f59fa329ec2a0805b2b8d7e2bed8d4a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Wojciech=20=C5=81oboda?= Date: Wed, 15 Oct 2025 22:12:42 +0200 Subject: [PATCH] ex5 --- module2/exo5/exo5_en.Rmd | 160 +++++++++++++++++++++++++-------------- 1 file changed, 103 insertions(+), 57 deletions(-) diff --git a/module2/exo5/exo5_en.Rmd b/module2/exo5/exo5_en.Rmd index f9003e3..49b3a32 100644 --- a/module2/exo5/exo5_en.Rmd +++ b/module2/exo5/exo5_en.Rmd @@ -3,26 +3,28 @@ title: "Analysis of the risk of failure of the O-rings on the Challenger shuttle author: "Arnaud Legrand" date: "28 juin 2018" output: html_document +editor_options: + markdown: + wrap: 72 --- -On January 27, 1986, the day before the takeoff of the shuttle _Challenger_, had -a three-hour teleconference was held between -Morton Thiokol (the manufacturer of one of the engines) and NASA. The -discussion focused on the consequences of the -temperature at take-off of 31°F (just below -0°C) for the success of the flight and in particular on the performance of the -O-rings used in the engines. Indeed, no test -had been performed at this temperature. - -The following study takes up some of the analyses carried out that -night with the objective of assessing the potential influence of -the temperature and pressure to which the O-rings are subjected -on their probability of malfunction. Our starting point is -the results of the experiments carried out by NASA engineers -during the six years preceding the launch of the shuttle -Challenger. +On January 27, 1986, the day before the takeoff of the shuttle +*Challenger*, had a three-hour teleconference was held between Morton +Thiokol (the manufacturer of one of the engines) and NASA. The +discussion focused on the consequences of the temperature at take-off of +31°F (just below 0°C) for the success of the flight and in particular on +the performance of the O-rings used in the engines. Indeed, no test had +been performed at this temperature. + +The following study takes up some of the analyses carried out that night +with the objective of assessing the potential influence of the +temperature and pressure to which the O-rings are subjected on their +probability of malfunction. Our starting point is the results of the +experiments carried out by NASA engineers during the six years preceding +the launch of the shuttle Challenger. # Loading the data + We start by loading this data: ```{r} @@ -31,41 +33,41 @@ data ``` The data set shows us the date of each test, the number of O-rings -(there are 6 on the main launcher), the -temperature (in Fahrenheit) and pressure (in psi), and finally the -number of identified malfunctions. +(there are 6 on the main launcher), the temperature (in Fahrenheit) and +pressure (in psi), and finally the number of identified malfunctions. # Graphical inspection -Flights without incidents do not provide any information -on the influence of temperature or pressure on malfunction. -We thus focus on the experiments in which at least one O-ring was defective. + +Flights without incidents do not provide any information on the +influence of temperature or pressure on malfunction. We thus focus on +the experiments in which at least one O-ring was defective. ```{r} data = data[data$Malfunction>0,] data ``` -We have a high temperature variability but -the pressure is almost always 200, which should -simplify the analysis. +We have a high temperature variability but the pressure is almost always +200, which should simplify the analysis. How does the frequency of failure vary with temperature? + ```{r} plot(data=data, Malfunction/Count ~ Temperature, ylim=c(0,1)) ``` -At first glance, the dependence does not look very important, but let's try to -estimate the impact of temperature $t$ on the probability of O-ring malfunction. +At first glance, the dependence does not look very important, but let's +try to estimate the impact of temperature $t$ on the probability of +O-ring malfunction. # Estimation of the temperature influence Suppose that each of the six O-rings is damaged with the same probability and independently of the others and that this probability depends only on the temperature. If $p(t)$ is this probability, the -number $D$ of malfunctioning O-rings during a flight at -temperature $t$ follows a binomial law with parameters $n=6$ and -$p=p(t)$. To link $p(t)$ to $t$, we will therefore perform a -logistic regression. +number $D$ of malfunctioning O-rings during a flight at temperature $t$ +follows a binomial law with parameters $n=6$ and $p=p(t)$. To link +$p(t)$ to $t$, we will therefore perform a logistic regression. ```{r} logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count, @@ -73,15 +75,16 @@ logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count, summary(logistic_reg) ``` -The most likely estimator of the temperature parameter is 0.001416 -and the standard error of this estimator is 0.049, in other words we -cannot distinguish any particular impact and we must take our -estimates with caution. +The most likely estimator of the temperature parameter is 0.001416 and +the standard error of this estimator is 0.049, in other words we cannot +distinguish any particular impact and we must take our estimates with +caution. # Estimation of the probability of O-ring malfunction + The expected temperature on the take-off day is 31°F. Let's try to -estimate the probability of O-ring malfunction at -this temperature from the model we just built: +estimate the probability of O-ring malfunction at this temperature from +the model we just built: ```{r} # shuttle=shuttle[shuttle$r!=0,] @@ -91,29 +94,72 @@ plot(tempv,rmv,type="l",ylim=c(0,1)) points(data=data, Malfunction/Count ~ Temperature) ``` -As expected from the initial data, the -temperature has no significant impact on the probability of failure of the -O-rings. It will be about 0.2, as in the tests -where we had a failure of at least one joint. Let's get back to the initial dataset to estimate the probability of failure: +As expected from the initial data, the temperature has no significant +impact on the probability of failure of the O-rings. It will be about +0.2, as in the tests where we had a failure of at least one joint. Let's +get back to the initial dataset to estimate the probability of failure: ```{r} data_full = read.csv("shuttle.csv",header=T) sum(data_full$Malfunction)/sum(data_full$Count) ``` -This probability is thus about $p=0.065$. Knowing that there is -a primary and a secondary O-ring on each of the three parts of the -launcher, the probability of failure of both joints of a launcher -is $p^2 \approx 0.00425$. The probability of failure of any one of the -launchers is $1-(1-p^2)^3 \approx 1.2%$. That would really be -bad luck.... Everything is under control, so the takeoff can happen -tomorrow as planned. - -But the next day, the Challenger shuttle exploded and took away -with her the seven crew members. The public was shocked and in -the subsequent investigation, the reliability of the -O-rings was questioned. Beyond the internal communication problems -of NASA, which have a lot to do with this fiasco, the previous analysis -includes (at least) a small problem.... Can you find it? -You are free to modify this analysis and to look at this dataset -from all angles in order to to explain what's wrong. +This probability is thus about $p=0.065$. Knowing that there is a +primary and a secondary O-ring on each of the three parts of the +launcher, the probability of failure of both joints of a launcher is +$p^2 \approx 0.00425$. The probability of failure of any one of the +launchers is $1-(1-p^2)^3 \approx 1.2%$. That would really be bad +luck.... Everything is under control, so the takeoff can happen tomorrow +as planned. + +But the next day, the Challenger shuttle exploded and took away with her +the seven crew members. The public was shocked and in the subsequent +investigation, the reliability of the O-rings was questioned. Beyond the +internal communication problems of NASA, which have a lot to do with +this fiasco, the previous analysis includes (at least) a small +problem.... Can you find it? You are free to modify this analysis and to +look at this dataset from all angles in order to to explain what's +wrong. + +## Finding error + +in the provided data from tests, the range of temperatures is small, all +of them vary between 60-70, based on this data we cannot reason about +what will happen in the temperatures around 30. To show this we can +visualize confidence intervals for out logistic regression + +```{r} +# Create a sequence of Temperature values for plotting +newdata <- data.frame(Temperature = seq(0, + max(data$Temperature), + length.out = 100)) + +# Predict on the link (logit) scale with standard errors +pred <- predict(logistic_reg, newdata, type = "link", se.fit = TRUE) + +# Compute 95% CI on the link scale +newdata$fit <- pred$fit +newdata$lower <- pred$fit - 1.96 * pred$se.fit +newdata$upper <- pred$fit + 1.96 * pred$se.fit + +# Transform back to probability scale +newdata$fit_prob <- plogis(newdata$fit) +newdata$lower_prob <- plogis(newdata$lower) +newdata$upper_prob <- plogis(newdata$upper) +``` + +```{r} +library(ggplot2) + +ggplot(newdata, aes(x = Temperature, y = fit_prob)) + + geom_line(color = "blue") + # Predicted probability line + geom_ribbon(aes(ymin = lower_prob, ymax = upper_prob), alpha = 0.2) + # 95% CI + geom_point(data = data, aes(x = Temperature, y = Malfunction/Count), color = "red") + # observed proportions + labs(y = "Probability of Malfunction", + x = "Temperature") + + theme_minimal() +``` + +Model was to simple and took into account only, temperature not pressure +etc, we should do the same based on pressure, temperature, malfucntion +types -- 2.18.1