diff --git a/module2/exo5/exo5_en.Rmd b/module2/exo5/exo5_en.Rmd index f9003e3a9de8b87c66b620b3fb5157fc127a4e17..819ba9f263ee410da787d67710fe27e860ab929c 100644 --- a/module2/exo5/exo5_en.Rmd +++ b/module2/exo5/exo5_en.Rmd @@ -1,7 +1,7 @@ --- title: "Analysis of the risk of failure of the O-rings on the Challenger shuttle" -author: "Arnaud Legrand" -date: "28 juin 2018" +author: "Gkiouzepi Eleni" +date: "24/7/2021" output: html_document --- @@ -36,13 +36,12 @@ temperature (in Fahrenheit) and pressure (in psi), and finally the number of identified malfunctions. # Graphical inspection -Flights without incidents do not provide any information +~~Flights without incidents do not provide any information on the influence of temperature or pressure on malfunction. -We thus focus on the experiments in which at least one O-ring was defective. +We thus focus on the experiments in which at least one O-ring was defective.~~ **Wrong assumption** ```{r} -data = data[data$Malfunction>0,] -data +# mal = data[data$Malfunction>0,] ``` We have a high temperature variability but @@ -71,49 +70,64 @@ logistic regression. logistic_reg = glm(data=data, Malfunction/Count ~ Temperature, weights=Count, family=binomial(link='logit')) summary(logistic_reg) + +# mal_logistic_reg = glm(data=mal, Malfunction/Count ~ Temperature, weights=Count, +# family=binomial(link='logit')) +# summary(mal_logistic_reg) ``` -The most likely estimator of the temperature parameter is 0.001416 -and the standard error of this estimator is 0.049, in other words we -cannot distinguish any particular impact and we must take our +The most likely estimator of the temperature parameter is ~~0.001416~~ __-0.11560__ +and the standard error of this estimator is 0.047, in other words +**WRONG** ~~we +cannot distinguish any particular impact~~ +_it is inverse-dependent on temperature, if temperature decreases by 1 degree, the probability of O-ring malfunction increases by 0.1156,_ and we must take our estimates with caution. + # Estimation of the probability of O-ring malfunction The expected temperature on the take-off day is 31°F. Let's try to estimate the probability of O-ring malfunction at this temperature from the model we just built: ```{r} -# shuttle=shuttle[shuttle$r!=0,] +# shuttle=shuttle[shuttle$r!=0,] tempv = seq(from=30, to=90, by = .5) -rmv <- predict(logistic_reg,list(Temperature=tempv),type="response") -plot(tempv,rmv,type="l",ylim=c(0,1)) +# rmv_mal <- predict(mal_logistic_reg,list(Temperature=tempv),type="response") +# plot(tempv,rmv_mal,type="l",ylim=c(0,1)) +# points(data=mal, Malfunction/Count ~ Temperature) + + +rmv <- predict(logistic_reg,list(Temperature=tempv),se.fit=T,type="response") +plot(tempv,rmv$fit,type="l",ylim=c(0,1)) +lines(tempv,rmv$fit+rmv$se.fit,col="red") +lines(tempv,rmv$fit-rmv$se.fit,col="red") points(data=data, Malfunction/Count ~ Temperature) ``` -As expected from the initial data, the -temperature has no significant impact on the probability of failure of the -O-rings. It will be about 0.2, as in the tests -where we had a failure of at least one joint. Let's get back to the initial dataset to estimate the probability of failure: +~~As expected from the initial data~~, the +temperature has **VERY** ~~no~~ significant impact on the probability of failure of the +O-rings. It will be ~~about 0.2~~ **in average over 0.8 to as high as more than 1.0 (certain)**,~~as in the tests +where we had a failure of at least one joint~~ **so we are expecting a failure of at least 4 joints**. Let's ~~get back to the initial dataset to~~ estimate the probability of failure: ```{r} -data_full = read.csv("shuttle.csv",header=T) -sum(data_full$Malfunction)/sum(data_full$Count) +# data_full = read.csv("shuttle.csv",header=T) +# sum(data_full$Malfunction)/sum(data_full$Count) + +estim = predict(logistic_reg,list(Temperature=31),se.fit=T,type="response") +estim ``` -This probability is thus about $p=0.065$. Knowing that there is +This probability is thus about $p=`r round(estim$fit, digits = 5)`\pm`r round(estim$se.fit, digits = 5)`$. Knowing that there is a primary and a secondary O-ring on each of the three parts of the launcher, the probability of failure of both joints of a launcher -is $p^2 \approx 0.00425$. The probability of failure of any one of the -launchers is $1-(1-p^2)^3 \approx 1.2%$. That would really be +is $p^2 \approx `r round((estim$fit+estim$se.fit)^2, digits = 2)`\pm`r round(2*estim$se.fit*estim$fit, digits = 2)`$. The probability of failure of any one of the +launchers is $1-(1-p^2)^3 \approx `r (1-(1-round((estim$fit+estim$se.fit)^2, digits = 0))^3)*100`\%$. ~~That would really be bad luck.... Everything is under control, so the takeoff can happen -tomorrow as planned. +tomorrow as planned~~.**ABORT! ABORT! ABORT THE MISSION!** -But the next day, the Challenger shuttle exploded and took away +*Unfortunately, none of the above analysis was carried out properly and* the next day, the Challenger shuttle exploded and took away with her the seven crew members. The public was shocked and in the subsequent investigation, the reliability of the O-rings was questioned. Beyond the internal communication problems of NASA, which have a lot to do with this fiasco, the previous analysis -includes (at least) a small problem.... Can you find it? -You are free to modify this analysis and to look at this dataset -from all angles in order to to explain what's wrong. +includes (at least) a small problem.