no commit message

parent 9a0a4620
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 1,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -41,7 +41,7 @@ ...@@ -41,7 +41,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 2,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -50,7 +50,7 @@ ...@@ -50,7 +50,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 3,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -66,7 +66,7 @@ ...@@ -66,7 +66,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": 4,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -534,7 +534,7 @@ ...@@ -534,7 +534,7 @@
"[1314 rows x 3 columns]" "[1314 rows x 3 columns]"
] ]
}, },
"execution_count": 5, "execution_count": 4,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
...@@ -552,7 +552,7 @@ ...@@ -552,7 +552,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 6, "execution_count": 5,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -592,7 +592,7 @@ ...@@ -592,7 +592,7 @@
"Index: []" "Index: []"
] ]
}, },
"execution_count": 6, "execution_count": 5,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
...@@ -603,7 +603,7 @@ ...@@ -603,7 +603,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 6,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -643,7 +643,7 @@ ...@@ -643,7 +643,7 @@
"Index: []" "Index: []"
] ]
}, },
"execution_count": 7, "execution_count": 6,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
...@@ -654,7 +654,7 @@ ...@@ -654,7 +654,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 7,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -694,7 +694,7 @@ ...@@ -694,7 +694,7 @@
"Index: []" "Index: []"
] ]
}, },
"execution_count": 8, "execution_count": 7,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
...@@ -707,12 +707,12 @@ ...@@ -707,12 +707,12 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"There seems to be no error in the dataset." "Le dataset paraît correct, et les données brutes sont ainsi utilisées pour l'analyse."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 8,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -735,7 +735,7 @@ ...@@ -735,7 +735,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 10, "execution_count": 9,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -761,7 +761,7 @@ ...@@ -761,7 +761,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"On remarque que le taux de mortalité est - nettement - plus élevé dans le groupe des non fumeuses, ce qui constitue le paradoxe de Simpson" "On remarque que le taux de mortalité est - nettement - plus élevé dans le groupe des non fumeuses, ce qui constitue le paradoxe de Simpson, vu que le sens commun ferait s'attendre à la conclusion inverse."
] ]
}, },
{ {
...@@ -780,7 +780,7 @@ ...@@ -780,7 +780,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 11, "execution_count": 10,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -815,7 +815,7 @@ ...@@ -815,7 +815,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"blabla il y a plus de vieilles non fumeuses, donc plus de morts" "Une explication possible de ce paradoxe est que le groupe des non fumeuses contient plus de personnes agées (proportionnellement), vu que les non fumeuses vivent plus longtemps, et du coup il contient également un taux de mortalité plus élevé, vu que l'âge est la principale variable explicative du taux de décès."
] ]
}, },
{ {
...@@ -834,7 +834,7 @@ ...@@ -834,7 +834,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 45, "execution_count": 11,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -853,74 +853,7 @@ ...@@ -853,74 +853,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 19, "execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import classification_report, confusion_matrix"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression(C=10.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=0, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)\n",
"model.fit(data[data['Smoker'] == \"Yes\"]['Age'].values.reshape(-1,1), data[data['Smoker'] == \"Yes\"]['Dead?'])\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"p_pred = model.predict_proba(data['Age'].values.reshape(-1,1))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.98225578 0.01774422]\n",
" [0.98490288 0.01509712]\n",
" [0.61947454 0.38052546]\n",
" ...\n",
" [0.51071991 0.48928009]\n",
" [0.07464525 0.92535475]\n",
" [0.90594064 0.09405936]]\n"
]
}
],
"source": [
"print(p_pred)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -930,125 +863,7 @@ ...@@ -930,125 +863,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 41, "execution_count": 13,
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'list' object has no attribute 'reshape'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-41-ddab106694e4>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Age'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Smoker'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd_constant\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Dead?'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAttributeError\u001b[0m: 'list' object has no attribute 'reshape'"
]
}
],
"source": [
"x1 = data['Age'].values.reshape(-1,1)\n",
"x2 = data['Smoke?'].values.reshape(-1,1)\n",
"x = sm.add_constant(x)\n",
"y = data['Dead?']"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.382339\n",
" Iterations 7\n"
]
}
],
"source": [
"model = sm.Logit(y, x)\n",
"result = model.fit(method='newton')"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"simpletable\">\n",
"<caption>Logit Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>Dead?</td> <th> No. Observations: </th> <td> 1314</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>Logit</td> <th> Df Residuals: </th> <td> 1312</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>MLE</td> <th> Df Model: </th> <td> 1</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Mon, 31 Aug 2020</td> <th> Pseudo R-squ.: </th> <td>0.3560</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>15:21:58</td> <th> Log-Likelihood: </th> <td> -502.39</td> \n",
"</tr>\n",
"<tr>\n",
" <th>converged:</th> <td>True</td> <th> LL-Null: </th> <td> -780.16</td> \n",
"</tr>\n",
"<tr>\n",
" <th> </th> <td> </td> <th> LLR p-value: </th> <td>7.883e-123</td>\n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[0.025</th> <th>0.975]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>const</th> <td> -6.1045</td> <td> 0.321</td> <td> -18.992</td> <td> 0.000</td> <td> -6.735</td> <td> -5.475</td>\n",
"</tr>\n",
"<tr>\n",
" <th>x1</th> <td> 0.0977</td> <td> 0.006</td> <td> 17.578</td> <td> 0.000</td> <td> 0.087</td> <td> 0.109</td>\n",
"</tr>\n",
"</table>"
],
"text/plain": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Dead? No. Observations: 1314\n",
"Model: Logit Df Residuals: 1312\n",
"Method: MLE Df Model: 1\n",
"Date: Mon, 31 Aug 2020 Pseudo R-squ.: 0.3560\n",
"Time: 15:21:58 Log-Likelihood: -502.39\n",
"converged: True LL-Null: -780.16\n",
" LLR p-value: 7.883e-123\n",
"==============================================================================\n",
" coef std err z P>|z| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"const -6.1045 0.321 -18.992 0.000 -6.735 -5.475\n",
"x1 0.0977 0.006 17.578 0.000 0.087 0.109\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.summary()"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
...@@ -1061,7 +876,7 @@ ...@@ -1061,7 +876,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 61, "execution_count": 14,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -1081,7 +896,7 @@ ...@@ -1081,7 +896,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 62, "execution_count": 15,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -1099,10 +914,10 @@ ...@@ -1099,10 +914,10 @@
" <th>Method:</th> <td>MLE</td> <th> Df Model: </th> <td> 2</td> \n", " <th>Method:</th> <td>MLE</td> <th> Df Model: </th> <td> 2</td> \n",
"</tr>\n", "</tr>\n",
"<tr>\n", "<tr>\n",
" <th>Date:</th> <td>Mon, 31 Aug 2020</td> <th> Pseudo R-squ.: </th> <td>0.3579</td> \n", " <th>Date:</th> <td>Tue, 01 Sep 2020</td> <th> Pseudo R-squ.: </th> <td>0.3579</td> \n",
"</tr>\n", "</tr>\n",
"<tr>\n", "<tr>\n",
" <th>Time:</th> <td>15:35:59</td> <th> Log-Likelihood: </th> <td> -500.95</td> \n", " <th>Time:</th> <td>10:16:00</td> <th> Log-Likelihood: </th> <td> -500.95</td> \n",
"</tr>\n", "</tr>\n",
"<tr>\n", "<tr>\n",
" <th>converged:</th> <td>True</td> <th> LL-Null: </th> <td> -780.16</td> \n", " <th>converged:</th> <td>True</td> <th> LL-Null: </th> <td> -780.16</td> \n",
...@@ -1134,8 +949,8 @@ ...@@ -1134,8 +949,8 @@
"Dep. Variable: Dead? No. Observations: 1314\n", "Dep. Variable: Dead? No. Observations: 1314\n",
"Model: Logit Df Residuals: 1311\n", "Model: Logit Df Residuals: 1311\n",
"Method: MLE Df Model: 2\n", "Method: MLE Df Model: 2\n",
"Date: Mon, 31 Aug 2020 Pseudo R-squ.: 0.3579\n", "Date: Tue, 01 Sep 2020 Pseudo R-squ.: 0.3579\n",
"Time: 15:35:59 Log-Likelihood: -500.95\n", "Time: 10:16:00 Log-Likelihood: -500.95\n",
"converged: True LL-Null: -780.16\n", "converged: True LL-Null: -780.16\n",
" LLR p-value: 5.534e-122\n", " LLR p-value: 5.534e-122\n",
"==============================================================================\n", "==============================================================================\n",
...@@ -1148,7 +963,7 @@ ...@@ -1148,7 +963,7 @@
"\"\"\"" "\"\"\""
] ]
}, },
"execution_count": 62, "execution_count": 15,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
...@@ -1156,6 +971,20 @@ ...@@ -1156,6 +971,20 @@
"source": [ "source": [
"result.summary()" "result.summary()"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le tableau précédent donne les résultats de la régression logistique, avec x1 qui représente l'âge, et x2 qui représente le status (fumeuse ou non fumeuse). Le modèle cherche donc à expliquer la variable \"Dead ?\" à l'aide des variables âge et fumeur ou non. Les résultats montrent un p-value à 0.09 pour le status de fumeur, ce qui signifie que l'on peut rejeter l'hypothèse nulle pour une valeur significative de 10%, mais pas pour 5%. De plus le coefficient associé est positif, on peut donc en conclure que le fait de fumer impacte négativement l'espérance de vie, pour un seuil significatif de 10%. Des recherches plus approfondies seraient nécessaires pour établir des conclusions plus claires."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
} }
], ],
"metadata": { "metadata": {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment