{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Autour du paradoxe de Simpson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ce document présente une analyse de données pour le MOOC recherche reproductible. Le but est d'analyser les données autour du paradoxe de Simpson, qui donne l'impression - en premier lieu - donner des conclusions surprenantes sur l'effet du tabagisme sur la santé." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing and checking the data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous récupérons les données sous format CSV depuis le Gitlab du MOOC." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data_url =\"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv?inline=false\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "raw_data = pd.read_csv(data_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regardons visuellement le dataset." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
2NoDead57.5
3NoAlive47.1
4YesAlive81.4
5NoAlive36.8
6NoAlive23.8
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
10YesAlive30.0
11NoDead66.0
12YesAlive49.2
13NoAlive58.4
14NoDead60.6
15NoAlive25.1
16NoAlive43.5
17NoAlive27.1
18NoAlive58.3
19YesAlive65.7
20NoDead73.2
21YesAlive38.3
22NoAlive33.4
23YesDead62.3
24NoAlive18.0
25NoAlive56.2
26YesAlive59.2
27NoAlive25.8
28NoDead36.9
29NoAlive20.2
............
1284YesDead36.0
1285YesAlive48.3
1286NoAlive63.1
1287NoAlive60.8
1288YesDead39.3
1289NoAlive36.7
1290NoAlive63.8
1291NoDead71.3
1292NoAlive57.7
1293NoAlive63.2
1294NoAlive46.6
1295YesDead82.4
1296YesAlive38.3
1297YesAlive32.7
1298NoAlive39.7
1299YesDead60.0
1300NoDead71.0
1301NoAlive20.5
1302NoAlive44.4
1303YesAlive31.2
1304YesAlive47.8
1305YesAlive60.9
1306NoDead61.4
1307YesAlive43.0
1308NoAlive42.1
1309YesAlive35.9
1310NoAlive22.3
1311YesDead62.1
1312NoDead88.6
1313NoAlive39.1
\n", "

1314 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "2 No Dead 57.5\n", "3 No Alive 47.1\n", "4 Yes Alive 81.4\n", "5 No Alive 36.8\n", "6 No Alive 23.8\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5\n", "10 Yes Alive 30.0\n", "11 No Dead 66.0\n", "12 Yes Alive 49.2\n", "13 No Alive 58.4\n", "14 No Dead 60.6\n", "15 No Alive 25.1\n", "16 No Alive 43.5\n", "17 No Alive 27.1\n", "18 No Alive 58.3\n", "19 Yes Alive 65.7\n", "20 No Dead 73.2\n", "21 Yes Alive 38.3\n", "22 No Alive 33.4\n", "23 Yes Dead 62.3\n", "24 No Alive 18.0\n", "25 No Alive 56.2\n", "26 Yes Alive 59.2\n", "27 No Alive 25.8\n", "28 No Dead 36.9\n", "29 No Alive 20.2\n", "... ... ... ...\n", "1284 Yes Dead 36.0\n", "1285 Yes Alive 48.3\n", "1286 No Alive 63.1\n", "1287 No Alive 60.8\n", "1288 Yes Dead 39.3\n", "1289 No Alive 36.7\n", "1290 No Alive 63.8\n", "1291 No Dead 71.3\n", "1292 No Alive 57.7\n", "1293 No Alive 63.2\n", "1294 No Alive 46.6\n", "1295 Yes Dead 82.4\n", "1296 Yes Alive 38.3\n", "1297 Yes Alive 32.7\n", "1298 No Alive 39.7\n", "1299 Yes Dead 60.0\n", "1300 No Dead 71.0\n", "1301 No Alive 20.5\n", "1302 No Alive 44.4\n", "1303 Yes Alive 31.2\n", "1304 Yes Alive 47.8\n", "1305 Yes Alive 60.9\n", "1306 No Dead 61.4\n", "1307 Yes Alive 43.0\n", "1308 No Alive 42.1\n", "1309 Yes Alive 35.9\n", "1310 No Alive 22.3\n", "1311 Yes Dead 62.1\n", "1312 No Dead 88.6\n", "1313 No Alive 39.1\n", "\n", "[1314 rows x 3 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous allons maintenant vérifier si aucune donnée n'est manquante, et si les différentes lignes concordent entre elles." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Smoker, Status, Age]\n", "Index: []" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Smoker, Status, Age]\n", "Index: []" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[(raw_data['Smoker'] != \"Yes\") & (raw_data['Smoker'] != \"No\")]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Smoker, Status, Age]\n", "Index: []" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[(raw_data['Status'] != \"Alive\") & (raw_data['Status'] != \"Dead\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There seems to be no error in the dataset." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "data = raw_data.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Taux de mortalité (question 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous allons calculer le taux de mortalité sur la période pour les deux groupes de femmes: fumeuses et non fumeuses, et représenter les résultats sous forme d'un tableau." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Smokers Non-Smokers\n", "Vivantes 443.000000 502.000000\n", "Mortes 139.000000 230.000000\n", "Taux mortalité 0.238832 0.314208\n" ] } ], "source": [ "alive = [len(data[(data['Smoker'] == \"Yes\") & (data['Status'] == \"Alive\")].index), len(data[(data['Smoker'] == \"No\") & (data['Status'] == \"Alive\")].index)]\n", "dead = [len(data[(data['Smoker'] == \"Yes\") & (data['Status'] == \"Dead\")].index), len(data[(data['Smoker'] == \"No\") & (data['Status'] == \"Dead\")].index)]\n", "deathrate = [dead[0]/(alive[0] + dead[0]), dead[1]/(alive[1] + dead[1])]\n", "\n", "print(pd.DataFrame([alive, dead, deathrate], [\"Vivantes\", \"Mortes\", \"Taux mortalité\"], [\"Smokers\", \"Non-Smokers\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On remarque que le taux de mortalité est - nettement - plus élevé dans le groupe des non fumeuses, ce qui constitue le paradoxe de Simpson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Taux de mortalité par classes d'age (question 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous allons regarder si les résultats persistent en prenant en compte les différentes classes d'age" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Smokers Non-Smokers\n", "18-34 0.037037 0.026432\n", "35-54 0.170306 0.099476\n", "55-64 0.443478 0.330579\n", "65+ 0.857143 0.854922\n" ] } ], "source": [ "classes_breaks = [0,35,55,65,150] \n", "tab_alive = [ [len(data[(data['Smoker'] == \"Yes\") & (data['Status'] == \"Alive\") & (classes_breaks[i] <= data['Age']) & (data['Age'] < classes_breaks[i+1])]), len(data[(data['Smoker'] == \"No\") & (data['Status'] == \"Alive\") & (classes_breaks[i] <= data['Age']) & (data['Age'] < classes_breaks[i+1])])] for i in [0,1,2,3]]\n", "tab_dead = [ [len(data[(data['Smoker'] == \"Yes\") & (data['Status'] == \"Dead\") & (classes_breaks[i] <= data['Age']) & (data['Age'] < classes_breaks[i+1])]), len(data[(data['Smoker'] == \"No\") & (data['Status'] == \"Dead\") & (classes_breaks[i] <= data['Age']) & (data['Age'] < classes_breaks[i+1])])] for i in [0,1,2,3]]\n", "tab_deathrate = [ [tab_dead[i][0]/(tab_dead[i][0]+tab_alive[i][0]) , tab_dead[i][1]/(tab_dead[i][1]+tab_alive[i][1])] for i in [0,1,2,3]]\n", "\n", "print(pd.DataFrame(tab_deathrate, [\"18-34\",\"35-54\",\"55-64\",\"65+\"], [\"Smokers\", \"Non-Smokers\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On remarque cette fois que, pour chaque classe d'âge, le résultat est attendu où le taux de mortalité est nettement supérieur pour le groupe des fumeuses, sauf pour les plus de 65 ans où les résultats sont sensiblement égaux." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "blabla il y a plus de vieilles non fumeuses, donc plus de morts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vérification de l'hypothèse - régression logistique (question 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour commencer on va rajouter les variables de type boolean dans le dataset, pour représenter les variables Status et Smoker." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "dead_bool = [(data['Status'][i] == \"Dead\") for i in range(len(data))]\n", "data['Dead?'] = dead_bool\n", "smoke_bool = [(data['Smoker'][i] == \"Yes\") for i in range(len(data))]\n", "data['Smoke?'] = smoke_bool" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Nous allons tester les hypothèses par régression logistique." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import classification_report, confusion_matrix" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(C=10.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", " penalty='l2', random_state=0, solver='liblinear', tol=0.0001,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)\n", "model.fit(data[data['Smoker'] == \"Yes\"]['Age'].values.reshape(-1,1), data[data['Smoker'] == \"Yes\"]['Dead?'])\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "p_pred = model.predict_proba(data['Age'].values.reshape(-1,1))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.98225578 0.01774422]\n", " [0.98490288 0.01509712]\n", " [0.61947454 0.38052546]\n", " ...\n", " [0.51071991 0.48928009]\n", " [0.07464525 0.92535475]\n", " [0.90594064 0.09405936]]\n" ] } ], "source": [ "print(p_pred)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import statsmodels.api as sm\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'list' object has no attribute 'reshape'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Age'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Smoker'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd_constant\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Dead?'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAttributeError\u001b[0m: 'list' object has no attribute 'reshape'" ] } ], "source": [ "x1 = data['Age'].values.reshape(-1,1)\n", "x2 = data['Smoke?'].values.reshape(-1,1)\n", "x = sm.add_constant(x)\n", "y = data['Dead?']" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.382339\n", " Iterations 7\n" ] } ], "source": [ "model = sm.Logit(y, x)\n", "result = model.fit(method='newton')" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: Dead? No. Observations: 1314
Model: Logit Df Residuals: 1312
Method: MLE Df Model: 1
Date: Mon, 31 Aug 2020 Pseudo R-squ.: 0.3560
Time: 15:21:58 Log-Likelihood: -502.39
converged: True LL-Null: -780.16
LLR p-value: 7.883e-123
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
const -6.1045 0.321 -18.992 0.000 -6.735 -5.475
x1 0.0977 0.006 17.578 0.000 0.087 0.109
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: Dead? No. Observations: 1314\n", "Model: Logit Df Residuals: 1312\n", "Method: MLE Df Model: 1\n", "Date: Mon, 31 Aug 2020 Pseudo R-squ.: 0.3560\n", "Time: 15:21:58 Log-Likelihood: -502.39\n", "converged: True LL-Null: -780.16\n", " LLR p-value: 7.883e-123\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -6.1045 0.321 -18.992 0.000 -6.735 -5.475\n", "x1 0.0977 0.006 17.578 0.000 0.087 0.109\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.summary()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "x1 = data['Age'].values.reshape(-1,1)\n", "x2 = data['Smoke?'].values.reshape(-1,1)\n", "x = np.hstack((x1,x2))\n", "x = sm.add_constant(x)\n", "y = data['Dead?']" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.381244\n", " Iterations 7\n" ] } ], "source": [ "model = sm.Logit(y, x)\n", "result = model.fit(method='newton')" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: Dead? No. Observations: 1314
Model: Logit Df Residuals: 1311
Method: MLE Df Model: 2
Date: Mon, 31 Aug 2020 Pseudo R-squ.: 0.3579
Time: 15:35:59 Log-Likelihood: -500.95
converged: True LL-Null: -780.16
LLR p-value: 5.534e-122
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
const -6.3519 0.360 -17.637 0.000 -7.058 -5.646
x1 0.0998 0.006 17.290 0.000 0.089 0.111
x2 0.2787 0.165 1.689 0.091 -0.045 0.602
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: Dead? No. Observations: 1314\n", "Model: Logit Df Residuals: 1311\n", "Method: MLE Df Model: 2\n", "Date: Mon, 31 Aug 2020 Pseudo R-squ.: 0.3579\n", "Time: 15:35:59 Log-Likelihood: -500.95\n", "converged: True LL-Null: -780.16\n", " LLR p-value: 5.534e-122\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -6.3519 0.360 -17.637 0.000 -7.058 -5.646\n", "x1 0.0998 0.006 17.290 0.000 0.089 0.111\n", "x2 0.2787 0.165 1.689 0.091 -0.045 0.602\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.summary()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }