diff --git a/module2/exo5/exo5_en.ipynb b/module2/exo5/exo5_en.ipynb index b310181c7fd7680232795a6b7b4e3d104750272d..db511a8aaa593144adf60c5d06320f6f0396f47f 100644 --- a/module2/exo5/exo5_en.ipynb +++ b/module2/exo5/exo5_en.ipynb @@ -495,7 +495,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -504,10 +504,10 @@ "
Dep. Variable: | Frequency | No. Observations: | 23 | \n", + "Dep. Variable: | Frequency | No. Observations: | 7 | \n", "
---|---|---|---|---|---|---|---|
Model: | GLM | Df Residuals: | 21 | \n", + "Model: | GLM | Df Residuals: | 5 | \n", "
Model Family: | Binomial | Df Model: | 1 | \n", @@ -516,16 +516,16 @@ "Link Function: | logit | Scale: | 1.0000 | \n", "
Method: | IRLS | Log-Likelihood: | -3.9210 | \n", + "Method: | IRLS | Log-Likelihood: | -2.5250 | \n", "
Date: | Thu, 22 Oct 2020 | Deviance: | 3.0144 | \n", + "Date: | Thu, 05 Nov 2020 | Deviance: | 0.22231 | \n", "
Time: | 11:23:38 | Pearson chi2: | 5.00 | \n", + "Time: | 02:29:10 | Pearson chi2: | 0.236 | \n", "
No. Iterations: | 6 | Covariance Type: | nonrobust | \n", + "No. Iterations: | 4 | Covariance Type: | nonrobust | \n", "
coef | std err | z | P>|z| | [0.025 | 0.975] | \n", "\n", "||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Intercept | 5.0850 | 7.477 | 0.680 | 0.496 | -9.570 | 19.740 | \n", + "Intercept | -1.3895 | 7.828 | -0.178 | 0.859 | -16.732 | 13.953 | \n", "
Temperature | -0.1156 | 0.115 | -1.004 | 0.316 | -0.341 | 0.110 | \n", + "Temperature | 0.0014 | 0.122 | 0.012 | 0.991 | -0.238 | 0.240 | \n", "
As we can see using visual inspection there is some tendency the tests with temperatures between 65 and 90 have less failure. Of course we can't really conclude using only this observation, however at least we know that ...
" + "As we can see using visual inspection there is some tendency the tests with temperatures between 65 and 90 have less failure. Of course we can't really conclude using only this observation since we don't have enough data for the lower temperatures. However, we can see clearly that there were 2 failures occured on the lowest temperature which is 53, and there is no successful experiment below 65. This visual inspection should have raised suspicion before launching the challenger shuttle.
" ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -743,6 +743,10 @@ } ], "source": [ + "data[\"Success\"]=data.Count-data.Malfunction\n", + "data[\"Intercept\"]=1\n", + "logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], family=sm.families.Binomial(sm.families.links.logit)).fit()\n", + "\n", "%matplotlib inline\n", "data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1})\n", "data_pred['Frequency'] = logmodel.predict(data_pred[['Intercept','Temperature']])\n", @@ -755,7 +759,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Also when we plot the regression now we can see that lower temperatures seem to have more ...
" + "Now on the plot above we reuse the same regression technique, however, this time we include all of dataset. Now we can see clearly that the temperature of the experiment indeed has an influence on the probability of the failure. From the corresponding plot we can see the probability is arround 80 percent which is pretty high.
" ] }, {