{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sujet 6 : Autour du Paradoxe de Simpson"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Contexte de l'étude"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cette étude porte sur le [Paradoxe de Simpson](https://fr.wikipedia.org/wiki/Paradoxe_de_Simpson) (Simpson 1951, Undy 1903). Ce paradoxe est un paradoxe statistique \"dans lequel un phénomène observé de plusieurs groupes semble s'inverser lorsque les groupes sont combinés. Ce résultat qui semble impossible au premier abord est lié à des éléments qui ne sont pas pris en compte (comme la présence de variables non indépendantes ou de différences d'effectifs entre les groupes, etc.) est souvent rencontré dans la réalité, en particulier dans les sciences sociales et les statistiques médicales\" (Wikipédia). \n",
"\n",
"Pour représenter ce paradoxe, on utilisera les données d'un sondage des années 1970 d'une ville du nord-est de l'Angleterre sur un sixième des électeurs, complété par une seconde étude 20 ans plus tard (Vanderpump et al. 1995) sur les mêmes personnes. Le sondage initial avait été réalisé afin d'expliciter les travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Le second sondage avait pour objectif de savoir si les individus étaient envore en vie, notamment au vu de leur tabagisme.\n",
"\n",
"Pour ce MOOC : \"Nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant\n",
"actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé et ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage\" (MOOC Recherche Reproductible)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importation des librairies python"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import urllib.request\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Traitement des données\n",
"\n",
"Les donnés sont disponibles sur le GitLab du MOOC Reproductibilité. Par soucis d'accessibilité et pour éviter toute disparition de données suite à la fermeture du MOOC, on enregistrera les données récupérées de manière locale. Elles seront uniquement téléchargées si la copie locale n'existe pas.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"data_url = 'https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv?inline=false'\n",
"data_file = 'simpson_paradox.csv'\n",
"\n",
"if not os.path.exists(data_file):\n",
" urllib.request.urlretrieve(data_url, data_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Chaque ligne des données représente une personne avec comme information:\n",
"- Si la personne fume (Yes/No)\n",
"- Si elle est vivante ou morte au moment de la 2ème étude (Alive/Dead)\n",
"- Son âge au 1er sondage (arrondi à la 1ère décimale)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Smoker
\n",
"
Status
\n",
"
Age
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Yes
\n",
"
Alive
\n",
"
21.0
\n",
"
\n",
"
\n",
"
1
\n",
"
Yes
\n",
"
Alive
\n",
"
19.3
\n",
"
\n",
"
\n",
"
2
\n",
"
No
\n",
"
Dead
\n",
"
57.5
\n",
"
\n",
"
\n",
"
3
\n",
"
No
\n",
"
Alive
\n",
"
47.1
\n",
"
\n",
"
\n",
"
4
\n",
"
Yes
\n",
"
Alive
\n",
"
81.4
\n",
"
\n",
"
\n",
"
5
\n",
"
No
\n",
"
Alive
\n",
"
36.8
\n",
"
\n",
"
\n",
"
6
\n",
"
No
\n",
"
Alive
\n",
"
23.8
\n",
"
\n",
"
\n",
"
7
\n",
"
Yes
\n",
"
Dead
\n",
"
57.5
\n",
"
\n",
"
\n",
"
8
\n",
"
Yes
\n",
"
Alive
\n",
"
24.8
\n",
"
\n",
"
\n",
"
9
\n",
"
Yes
\n",
"
Alive
\n",
"
49.5
\n",
"
\n",
"
\n",
"
10
\n",
"
Yes
\n",
"
Alive
\n",
"
30.0
\n",
"
\n",
"
\n",
"
11
\n",
"
No
\n",
"
Dead
\n",
"
66.0
\n",
"
\n",
"
\n",
"
12
\n",
"
Yes
\n",
"
Alive
\n",
"
49.2
\n",
"
\n",
"
\n",
"
13
\n",
"
No
\n",
"
Alive
\n",
"
58.4
\n",
"
\n",
"
\n",
"
14
\n",
"
No
\n",
"
Dead
\n",
"
60.6
\n",
"
\n",
"
\n",
"
15
\n",
"
No
\n",
"
Alive
\n",
"
25.1
\n",
"
\n",
"
\n",
"
16
\n",
"
No
\n",
"
Alive
\n",
"
43.5
\n",
"
\n",
"
\n",
"
17
\n",
"
No
\n",
"
Alive
\n",
"
27.1
\n",
"
\n",
"
\n",
"
18
\n",
"
No
\n",
"
Alive
\n",
"
58.3
\n",
"
\n",
"
\n",
"
19
\n",
"
Yes
\n",
"
Alive
\n",
"
65.7
\n",
"
\n",
"
\n",
"
20
\n",
"
No
\n",
"
Dead
\n",
"
73.2
\n",
"
\n",
"
\n",
"
21
\n",
"
Yes
\n",
"
Alive
\n",
"
38.3
\n",
"
\n",
"
\n",
"
22
\n",
"
No
\n",
"
Alive
\n",
"
33.4
\n",
"
\n",
"
\n",
"
23
\n",
"
Yes
\n",
"
Dead
\n",
"
62.3
\n",
"
\n",
"
\n",
"
24
\n",
"
No
\n",
"
Alive
\n",
"
18.0
\n",
"
\n",
"
\n",
"
25
\n",
"
No
\n",
"
Alive
\n",
"
56.2
\n",
"
\n",
"
\n",
"
26
\n",
"
Yes
\n",
"
Alive
\n",
"
59.2
\n",
"
\n",
"
\n",
"
27
\n",
"
No
\n",
"
Alive
\n",
"
25.8
\n",
"
\n",
"
\n",
"
28
\n",
"
No
\n",
"
Dead
\n",
"
36.9
\n",
"
\n",
"
\n",
"
29
\n",
"
No
\n",
"
Alive
\n",
"
20.2
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
1284
\n",
"
Yes
\n",
"
Dead
\n",
"
36.0
\n",
"
\n",
"
\n",
"
1285
\n",
"
Yes
\n",
"
Alive
\n",
"
48.3
\n",
"
\n",
"
\n",
"
1286
\n",
"
No
\n",
"
Alive
\n",
"
63.1
\n",
"
\n",
"
\n",
"
1287
\n",
"
No
\n",
"
Alive
\n",
"
60.8
\n",
"
\n",
"
\n",
"
1288
\n",
"
Yes
\n",
"
Dead
\n",
"
39.3
\n",
"
\n",
"
\n",
"
1289
\n",
"
No
\n",
"
Alive
\n",
"
36.7
\n",
"
\n",
"
\n",
"
1290
\n",
"
No
\n",
"
Alive
\n",
"
63.8
\n",
"
\n",
"
\n",
"
1291
\n",
"
No
\n",
"
Dead
\n",
"
71.3
\n",
"
\n",
"
\n",
"
1292
\n",
"
No
\n",
"
Alive
\n",
"
57.7
\n",
"
\n",
"
\n",
"
1293
\n",
"
No
\n",
"
Alive
\n",
"
63.2
\n",
"
\n",
"
\n",
"
1294
\n",
"
No
\n",
"
Alive
\n",
"
46.6
\n",
"
\n",
"
\n",
"
1295
\n",
"
Yes
\n",
"
Dead
\n",
"
82.4
\n",
"
\n",
"
\n",
"
1296
\n",
"
Yes
\n",
"
Alive
\n",
"
38.3
\n",
"
\n",
"
\n",
"
1297
\n",
"
Yes
\n",
"
Alive
\n",
"
32.7
\n",
"
\n",
"
\n",
"
1298
\n",
"
No
\n",
"
Alive
\n",
"
39.7
\n",
"
\n",
"
\n",
"
1299
\n",
"
Yes
\n",
"
Dead
\n",
"
60.0
\n",
"
\n",
"
\n",
"
1300
\n",
"
No
\n",
"
Dead
\n",
"
71.0
\n",
"
\n",
"
\n",
"
1301
\n",
"
No
\n",
"
Alive
\n",
"
20.5
\n",
"
\n",
"
\n",
"
1302
\n",
"
No
\n",
"
Alive
\n",
"
44.4
\n",
"
\n",
"
\n",
"
1303
\n",
"
Yes
\n",
"
Alive
\n",
"
31.2
\n",
"
\n",
"
\n",
"
1304
\n",
"
Yes
\n",
"
Alive
\n",
"
47.8
\n",
"
\n",
"
\n",
"
1305
\n",
"
Yes
\n",
"
Alive
\n",
"
60.9
\n",
"
\n",
"
\n",
"
1306
\n",
"
No
\n",
"
Dead
\n",
"
61.4
\n",
"
\n",
"
\n",
"
1307
\n",
"
Yes
\n",
"
Alive
\n",
"
43.0
\n",
"
\n",
"
\n",
"
1308
\n",
"
No
\n",
"
Alive
\n",
"
42.1
\n",
"
\n",
"
\n",
"
1309
\n",
"
Yes
\n",
"
Alive
\n",
"
35.9
\n",
"
\n",
"
\n",
"
1310
\n",
"
No
\n",
"
Alive
\n",
"
22.3
\n",
"
\n",
"
\n",
"
1311
\n",
"
Yes
\n",
"
Dead
\n",
"
62.1
\n",
"
\n",
"
\n",
"
1312
\n",
"
No
\n",
"
Dead
\n",
"
88.6
\n",
"
\n",
"
\n",
"
1313
\n",
"
No
\n",
"
Alive
\n",
"
39.1
\n",
"
\n",
" \n",
"
\n",
"
1314 rows × 3 columns
\n",
"
"
],
"text/plain": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4\n",
"5 No Alive 36.8\n",
"6 No Alive 23.8\n",
"7 Yes Dead 57.5\n",
"8 Yes Alive 24.8\n",
"9 Yes Alive 49.5\n",
"10 Yes Alive 30.0\n",
"11 No Dead 66.0\n",
"12 Yes Alive 49.2\n",
"13 No Alive 58.4\n",
"14 No Dead 60.6\n",
"15 No Alive 25.1\n",
"16 No Alive 43.5\n",
"17 No Alive 27.1\n",
"18 No Alive 58.3\n",
"19 Yes Alive 65.7\n",
"20 No Dead 73.2\n",
"21 Yes Alive 38.3\n",
"22 No Alive 33.4\n",
"23 Yes Dead 62.3\n",
"24 No Alive 18.0\n",
"25 No Alive 56.2\n",
"26 Yes Alive 59.2\n",
"27 No Alive 25.8\n",
"28 No Dead 36.9\n",
"29 No Alive 20.2\n",
"... ... ... ...\n",
"1284 Yes Dead 36.0\n",
"1285 Yes Alive 48.3\n",
"1286 No Alive 63.1\n",
"1287 No Alive 60.8\n",
"1288 Yes Dead 39.3\n",
"1289 No Alive 36.7\n",
"1290 No Alive 63.8\n",
"1291 No Dead 71.3\n",
"1292 No Alive 57.7\n",
"1293 No Alive 63.2\n",
"1294 No Alive 46.6\n",
"1295 Yes Dead 82.4\n",
"1296 Yes Alive 38.3\n",
"1297 Yes Alive 32.7\n",
"1298 No Alive 39.7\n",
"1299 Yes Dead 60.0\n",
"1300 No Dead 71.0\n",
"1301 No Alive 20.5\n",
"1302 No Alive 44.4\n",
"1303 Yes Alive 31.2\n",
"1304 Yes Alive 47.8\n",
"1305 Yes Alive 60.9\n",
"1306 No Dead 61.4\n",
"1307 Yes Alive 43.0\n",
"1308 No Alive 42.1\n",
"1309 Yes Alive 35.9\n",
"1310 No Alive 22.3\n",
"1311 Yes Dead 62.1\n",
"1312 No Dead 88.6\n",
"1313 No Alive 39.1\n",
"\n",
"[1314 rows x 3 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(url)\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On vérifir que toutes nos lignes sont bien remplies et que les âges sont cohérents"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Smoker
\n",
"
Status
\n",
"
Age
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [Smoker, Status, Age]\n",
"Index: []"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" data[data.isnull().any(axis=1)]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ages minimaux et maximaux: [18.0, 89.9]\n"
]
}
],
"source": [
"print('Ages minimaux et maximaux: ' + str([data.Age.min(), data.Age.max()]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Etudes\n",
"\n",
"### Décès en fonction des habitudes de tabagisme\n",
"\n",
"Le tableau suivant récapitule le nombre de femmes mortes ou vivantes selon sa relation au tabac."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Alive
\n",
"
Dead
\n",
"
Mortality
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Smoker
\n",
"
443
\n",
"
139
\n",
"
0.239
\n",
"
\n",
"
\n",
"
Non-smoker
\n",
"
502
\n",
"
230
\n",
"
0.314
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Alive Dead Mortality\n",
"Smoker 443 139 0.239\n",
"Non-smoker 502 230 0.314"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_death = pd.DataFrame(index=['Smoker', 'Non-smoker'], columns=['Alive', 'Dead'], data=[data[data.Smoker == 'Yes']['Status'].value_counts(), data[data.Smoker == 'No']['Status'].value_counts()])\n",
"data_death['Mortality'] = round(data_death['Dead'] / (data_death['Dead'] + data_death['Alive']), 3) \n",
"data_death"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"x = np.arange(2) # the label locations\n",
"width = 0.35 # the width of the bars\n",
"\n",
"fig, ax = plt.subplots()\n",
"ax.bar(x - width/2, data_death['Alive'], width, label='Alive')\n",
"ax.bar(x + width/2, data_death['Dead'], width, label='Dead')\n",
"ax2 = ax.twinx()\n",
"ax2.plot(x, data_death['Mortality'], color='r', marker='o', label='Mortality')\n",
"\n",
"ax.set_ylabel('Number of women')\n",
"ax2.set_ylabel('Mortality rate')\n",
"ax.set_xticks(x)\n",
"ax.set_xticklabels(['Smoker', 'Non-Smoker'])\n",
"ax.legend()\n",
"ax2.legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A partir de ces graphiques et résultats il serait logique de conclure que les non fumeuses ont une mortalité plus importante (31%) par rapport aux fumeuses (24%) et que donc fumer aide à vivre longtemps. Même en regardant les intervales de confiance sur la condition (morte **0** ou vivante **1**) de la personne suivant son statut de fumeur nous indique que les fumeurs ont plus de chance de survie."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAE4hJREFUeJzt3X+QXWd93/H3h3UUhx/mR7StqWRhkagQNbEJLEpNSeKUmMo0qfg1ICdTCCSjqo3ipjMG1OkUQpjQAZc0PxBR1VQYMlMEDCQRVEEEGAwUGCSnxrZMRRclWIuiIOMG/yi1WfvbP+7R4fp6tXtt69GVve/XzM7e5znPnv1Ko7kfnefc5zmpKiRJAnjMpAuQJJ09DAVJUs9QkCT1DAVJUs9QkCT1DAVJUs9QkCT1DAVJUs9QkCT1zpl0AQ/WypUr68ILL5x0GZL0iHLdddfdWlXTS417xIXChRdeyMGDByddhiQ9oiT5+jjjnD6SJPUMBUlSz1CQJPUMBUlSr2koJNmY5HCS2STbFzj+xCQfSfLlJIeSvKZlPZKkxTULhSRTwA7gcmA9cEWS9SPDfhW4uaouBi4F3pFkRauaJEmLa3mlsAGYraojVXUPsAfYNDKmgCckCfB44DZgvmFNkqRFtAyFVcDRofZc1zfsncCPAMeAG4F/XVX3NaxJkrSIlovXskDf6AOh/wlwPfCPgR8C/jzJZ6vq9vudKNkCbAFYs2ZNg1KXp9e//vUcP36c888/n7e//e2TLkfSWaDllcIccMFQezWDK4JhrwE+XAOzwF8Czxw9UVXtqqqZqpqZnl5ylbbGdPz4cb7xjW9w/PjxSZci6SzRMhQOAOuSrO1uHm8G9o6MuQV4AUCSvws8AzjSsCZJ0iKaTR9V1XySbcB+YArYXVWHkmztju8E3gJck+RGBtNNb6iqW1vVJElaXNMN8apqH7BvpG/n0OtjwAtb1iBJGp8rmiVJPUNBktQzFCRJPUNBktQzFCRJvUfc4zhPh+e87r2TLuGs8IRb72AKuOXWO/w7Aa67+lWTLkGaOK8UJEk9Q0GS1DMUJEk9Q0GS1DMUJEk9Q0GS1DMUJEk9Q0GS1DMUJEm9ZbmiWQP3rXjc/b5LZwufHz45hsIydtc6n2+ks9PJ54frzGs6fZRkY5LDSWaTbF/g+OuSXN993ZTk3iRPaVmTJOnUmoVCkilgB3A5sB64Isn64TFVdXVVPauqngX8W+DaqrqtVU2SpMW1vFLYAMxW1ZGqugfYA2xaZPwVwPsa1iNJWkLLUFgFHB1qz3V9D5DkscBG4EMN65EkLaFlKGSBvjrF2J8H/seppo6SbElyMMnBEydOnLYCJUn31zIU5oALhtqrgWOnGLuZRaaOqmpXVc1U1cz09PRpLFGSNKxlKBwA1iVZm2QFgzf+vaODkjwR+GngTxvWIkkaQ7N1ClU1n2QbsB+YAnZX1aEkW7vjO7uhLwE+XlV3tapFeqS45Td/bNIlnBXmb3sKcA7zt33dvxNgzRtvPGO/q+nitaraB+wb6ds50r4GuKZlHZKk8bj3kSSpZyhIknqGgiSpZyhIknqGgiSpZyhIknqGgiSpZyhIknqGgiSp5+M4JZ11Vp57HzDffdeZZChIOutcddHfTrqEZcvpI0lSz1CQJPUMBUlSz1CQJPUMBUlSz1CQJPWahkKSjUkOJ5lNsv0UYy5Ncn2SQ0mubVmPJGlxzdYpJJkCdgCXAXPAgSR7q+rmoTFPAt4FbKyqW5L8nVb1SJKW1vJKYQMwW1VHquoeYA+waWTMLwAfrqpbAKrqmw3rkSQtoWUorAKODrXnur5hfx94cpJPJ7kuyasa1iNJWkLLbS6yQF8t8PufA7wA+AHgC0m+WFVfvd+Jki3AFoA1a9Y0KFWSBG2vFOaAC4baq4FjC4z5WFXdVVW3Ap8BLh49UVXtqqqZqpqZnp5uVrAkLXctQ+EAsC7J2iQrgM3A3pExfwr8ZJJzkjwW+AngKw1rkiQtotn0UVXNJ9kG7AemgN1VdSjJ1u74zqr6SpKPATcA9wF/WFU3tapJkrS4pltnV9U+YN9I386R9tXA1S3rkCSNxxXNkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqRe01BIsjHJ4SSzSbYvcPzSJN9Ocn339caW9UiSFtfscZxJpoAdwGXAHHAgyd6qunlk6Ger6uda1SFJGl/LK4UNwGxVHamqe4A9wKaGv0+S9DC1DIVVwNGh9lzXN+qSJF9O8mdJ/kHDeiRJS2g2fQRkgb4aaf8F8LSqujPJi4A/AdY94ETJFmALwJo1a053nZKkTssrhTnggqH2auDY8ICqur2q7uxe7wO+L8nK0RNV1a6qmqmqmenp6YYlS9Ly1jIUDgDrkqxNsgLYDOwdHpDk/CTpXm/o6vlWw5okSYtoNn1UVfNJtgH7gSlgd1UdSrK1O74TeDnwL5PMA98BNlfV6BSTJOkMaXlP4eSU0L6Rvp1Dr98JvLNlDZKk8bmiWZLUMxQkSb0HHQpJnpzkohbFSJIma6xQSPLpJOcleQrwZeDdSX67bWmSpDNt3CuFJ1bV7cBLgXdX1XOAn21XliRpEsYNhXOSPBV4BfDRhvVIkiZo3FD4TQbrDWar6kCSpwP/u11ZkqRJGGudQlV9EPjgUPsI8LJWRUmSJmOsUEjybh64mR1V9drTXpEkaWLGXdE8fB/hXOAljGxuJ0l65Bt3+uhDw+0k7wM+0aQiSdLEPNQVzesAH2wgSY8y495TuIP731M4DryhSUWSpIkZd/roCa0LkSRN3rjbXHxynD5J0iPbolcKSc4FHgusTPJkvvfc5fOAv9e4NknSGbbU9NG/AH6dQQBcx/dC4XZgR8O6JEkTsOj0UVX9blWtBa6qqqdX1dru6+LuqWmLSrIxyeEks0m2LzLuuUnuTfLyh/BnkCSdJuPeaP79JD8KrGeweO1k/3tP9TNJphhcTVwGzAEHkuytqpsXGPc2BnsrSZImaNyPpL4JuJRBKOwDLgc+B5wyFIANDDbQO9KdYw+wCbh5ZNyvAR8CnvtgCpcknX7jLl57OfAC4HhVvQa4GPj+JX5mFXB0qD3X9fWSrGKwZcbOMeuQJDU0bih8p6ruA+aTnAd8E3j6Ej+TBfpGN9X7HeANVXXvoidKtiQ5mOTgiRMnxixZkvRgjbsh3sEkTwL+C4NPId0JfGmJn5kDLhhqr+aBm+jNAHuSAKwEXpRkvqr+ZHhQVe0CdgHMzMw8YLdWSdLpMe6N5n/VvdyZ5GPAeVV1wxI/dgBYl2Qt8A1gM/ALI+dde/J1kmuAj44GgiTpzHnQK5qr6q+q6oalVjRX1TywjcGnir4CfKCqDiXZmmTrwylaktRG0xXNVbWPwaeVhvsWvKlcVb80Rr2SpIYe7Irmk+7AFc2S9Kiz1PTR54Hn0a1oBt4M3ARcC/y3xrVJks6wpULhPwN3dyuafwr4D8B7gG/TfRpIkvTosdT00VRV3da9fiWwq3s054eSXN+2NEnSmbbUlcJUkpPB8QLgU0PHxl3jIEl6hFjqjf19wLVJbgW+A3wWIMkPM5hCkiQ9iiwaClX1W916hKcCH6+qk6uJH8NgIztJ0qPIklNAVfXFBfq+2qYcSdIkjbshniRpGTAUJEk9Q0GS1DMUJEk9Q0GS1DMUJEk9Q0GS1DMUJEk9Q0GS1GsaCkk2JjmcZDbJ9gWOb0pyQ5LrkxxM8vyW9UiSFtdsp9MkUwyeznYZMAccSLK3qm4eGvZJYG9VVZKLgA8Az2xVkyRpcS2vFDYAs1V1pKruAfYAm4YHVNWdQ5vsPQ4oJEkT0zIUVgFHh9pzXd/9JHlJkv8F/HfgtQ3rkSQtoWUoZIG+B1wJVNUfV9UzgRcDb1nwRMmW7p7DwRMnTpzmMiVJJ7UMhTnggqH2auDYqQZX1WeAH0qycoFju6pqpqpmpqenT3+lkiSgbSgcANYlWZtkBbAZ2Ds8IMkPJ0n3+tnACuBbDWuSJC2i2aePqmo+yTZgPzAF7K6qQ0m2dsd3Ai8DXpXkuwwe9/nKoRvPkqQzrFkoAFTVPmDfSN/OoddvA97WsgZJ0vhc0SxJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6hkKkqSeoSBJ6jUNhSQbkxxOMptk+wLHfzHJDd3X55Nc3LIeSdLimoVCkilgB3A5sB64Isn6kWF/Cfx0VV0EvAXY1aoeSdLSWl4pbABmq+pIVd0D7AE2DQ+oqs9X1f/pml8EVjesR5K0hJahsAo4OtSe6/pO5ZeBP2tYjyRpCec0PHcW6KsFByY/wyAUnn+K41uALQBr1qw5XfVJkka0vFKYAy4Yaq8Gjo0OSnIR8IfApqr61kInqqpdVTVTVTPT09NNipUktQ2FA8C6JGuTrAA2A3uHByRZA3wY+OdV9dWGtUiSxtBs+qiq5pNsA/YDU8DuqjqUZGt3fCfwRuAHgXclAZivqplWNUmSFtfyngJVtQ/YN9K3c+j1rwC/0rIGSdL4XNEsSeoZCpKknqEgSeoZCpKknqEgSeoZCpKknqEgSeoZCpKknqEgSeoZCpKknqEgSeoZCpKknqEgSeoZCpKknqEgSeoZCpKknqEgSeo1DYUkG5McTjKbZPsCx5+Z5AtJ7k5yVctaJElLa/Y4ziRTwA7gMmAOOJBkb1XdPDTsNuBK4MWt6pAkja/llcIGYLaqjlTVPcAeYNPwgKr6ZlUdAL7bsA5J0phahsIq4OhQe67rkySdpVqGQhboq4d0omRLkoNJDp44ceJhliVJOpWWoTAHXDDUXg0ceygnqqpdVTVTVTPT09OnpThJ0gO1DIUDwLoka5OsADYDexv+PknSw9Ts00dVNZ9kG7AfmAJ2V9WhJFu74zuTnA8cBM4D7kvy68D6qrq9VV2SpFNrFgoAVbUP2DfSt3Po9XEG00qSpLOAK5olST1DQZLUMxQkST1DQZLUMxQkST1DQZLUMxQkST1DQZLUMxQkST1DQZLUMxQkST1DQZLUMxQkST1DQZLUMxQkST1DQZLUMxQkSb2moZBkY5LDSWaTbF/geJL8Xnf8hiTPblmPJGlxzUIhyRSwA7gcWA9ckWT9yLDLgXXd1xbgD1rVI0laWssrhQ3AbFUdqap7gD3AppExm4D31sAXgScleWrDmiRJi2gZCquAo0Ptua7vwY6RJJ0h5zQ8dxboq4cwhiRbGEwvAdyZ5PDDrE3fsxK4ddJFnA3yH1896RJ0f/7bPOlNC71VPmhPG2dQy1CYAy4Yaq8Gjj2EMVTVLmDX6S5QkORgVc1Mug5plP82J6Pl9NEBYF2StUlWAJuBvSNj9gKv6j6F9A+Bb1fVXzesSZK0iGZXClU1n2QbsB+YAnZX1aEkW7vjO4F9wIuAWeD/Aq9pVY8kaWmpesAUvpaRJFu66TnprOK/zckwFCRJPbe5kCT1DIVHse4G/ueSXD7U94okH5tkXdKoJJXkHUPtq5L8xgRLWrYMhUexGswNbgV+O8m5SR4H/Bbwq5OtTHqAu4GXJlk56UKWO0PhUa6qbgI+ArwBeBODbUW+luTVSb6U5Pok70rymCTnJPmjJDcmuSnJlZOtXsvIPIO1SP9m9ECSpyX5ZLdp5ieTrDnz5S0fLRev6ezxZuAvgHuAmSQ/CrwEeF730eFdDNaRfA1YWVU/BpDkSZMqWMvSDuCGJG8f6X8ng//MvCfJa4HfA158xqtbJgyFZaCq7kryfuDOqro7yc8CzwUOJgH4AQZ7UO0HnpHkdxmsIfn4pGrW8lNVtyd5L3Al8J2hQ5cAL+1e/xEwGho6jQyF5eO+7gsGe07trqp/PzooyUUMtjS/EngZ39tzSjoTfofBVe27Fxnj5+gb8p7C8vQJ4BUnb+ol+cEka5JMM1i78kEG9x986JHOqKq6DfgA8MtD3Z9nML0J8IvA5850XcuJVwrLUFXdmOTNwCeSPAb4LoNPKd0L/NcM5pSKwc1p6Ux7B7BtqH0lsDvJ64ATuB1OU65oliT1nD6SJPUMBUlSz1CQJPUMBUlSz1CQJPUMBQlI8u+SHOr217k+yU88zPNdmuSjp6s+6UxxnYKWvSSXAD8HPLvbBmQlsGKC9ZxTVfOT+v1a3rxSkOCpwK1VdTdAVd1aVceS/FWStyb5QpKDSZ6dZH+Sr5181nj3zIqru11lb0zyytGTJ3lukv+Z5OlJHpdkd5IDXd+mbswvJflgko/gnlOaIK8UpMGb8BuTfJXBFiDvr6pru2NHq+qSJP8JuAb4R8C5wCFgJ4ON2p4FXAysBA4k+czJEyd5HvD7wKaquiXJW4FPVdVru11ov5TkE93wS4CLuq0epIkwFLTsVdWdSZ4D/CTwM8D7k2zvDu/tvt8IPL6q7gDuSPL/ujf15wPvq6p7gb9Jci2DHWhvB36EwTMCXlhVx7rzvBD4Z0mu6trnAiefD/DnBoImzVCQgO5N/dPAp5PcCLy6O3R39/2+odcn2+cw2HH2VP6awZv+jwMnQyHAy6rq8PDA7sb2XQ/jjyCdFt5T0LKX5BlJ1g11PQv4+pg//hnglUmmul1mfwr4Unfsb4F/Crw1yaVd337g17pNB0ny4w+3ful0MhQkeDzwniQ3J7kBWA/8xpg/+8fADcCXgU8Br6+q4ycPVtXfAD8P7OiuBt4CfB+DJ4zd1LWls4a7pEqSel4pSJJ6hoIkqWcoSJJ6hoIkqWcoSJJ6hoIkqWcoSJJ6hoIkqff/AYemX950QIirAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x='Smoker', y='Status', ci=95, data=data.replace('Alive', 1).replace('Dead', 0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mais est-ce vraiment le cas ? Nous avons regardé les données de manière globale sans rentrer dans les détails. Si l'on regarde l'âge des femmes suivant leur statut de fumeur un paradoxe commence à apparaître:"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADzVJREFUeJzt3XuQXnV9x/H3x0QHB+0IZgmpiGk7kcp4AV0viG1VhMF6CUK9TS/bykzGGS1qq2naTr116jixWnuhtmmlLtRacFqGyFgxRhFtrbAoEigi4igK2WSBosC0aMi3fzwn7RoTdlM5z9nk937NZM5zznP7wmT2nXPO85xNVSFJatdDhh5AkjQsQyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktS45UMPsBgrVqyo1atXDz2GJB1Urr766turamKhxx0UIVi9ejUzMzNDjyFJB5Uk31rM4zw0JEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1LiD4gtlkg5969evZ3Z2lqOPPpqNGzcOPU5TDIGkJWF2dpZbb7116DGa1GsIknwTuBu4H9hVVZNJjgQuBFYD3wReUVX/2ecckqT9G8c5gudV1QlVNdmtbwC2VtUaYGu3LkkayBAni9cC093taeCMAWaQJHX6DkEBn0xydZJ13baVVbUdoFse1fMMkqQH0PfJ4pOr6rYkRwFbknx1sU/swrEO4Nhjj+1rPklqXq8hqKrbuuXOJBcDzwB2JFlVVduTrAJ27ue5m4BNAJOTk9XnnNKQbnnnk4YeYUnYdeeRwHJ23fkt/58Ax75129jeq7dDQ0kOT/LIPbeB04DrgM3AVPewKeCSvmaQJC2szz2ClcDFSfa8zz9U1SeSXAVclORs4Bbg5T3OIElaQG8hqKpvAE/Zx/Y7gFP6el9J0oHxWkOS1DhDIEmN81pDkpaEFYftBnZ1S42TIZC0JLz5yXcNPUKzPDQkSY0zBJLUOEMgSY0zBJLUOEMgSY0zBJLUOEMgSY0zBJLUOL9Q1pj169czOzvL0UcfzcaNG4ceR9ISYAgaMzs7y6233jr0GJKWEA8NSVLjDIEkNc4QSFLjDIEkNa6Zk8VPe8v5Q4+wJDzy9rtZBtxy+93+PwGufs+vDT2CNDj3CCSpcYZAkhpnCCSpcYZAkhpnCCSpcc18akgjux92+A8tJckQNObeNacNPYKkJcZDQ5LUOEMgSY0zBJLUOEMgSY0zBJLUOEMgSY3rPQRJliX5cpJLu/Ujk2xJclO3PKLvGSRJ+zeOPYI3ADfMW98AbK2qNcDWbl2SNJBeQ5DkGOBFwN/O27wWmO5uTwNn9DmDJOmB9b1H8H5gPbB73raVVbUdoFseta8nJlmXZCbJzNzcXM9jSlK7egtBkhcDO6vq6v/P86tqU1VNVtXkxMTEgzydJGmPPq81dDLw0iS/CBwG/ESSvwd2JFlVVduTrAJ29jiDJGkBve0RVNXvVtUxVbUaeBXw6ar6FWAzMNU9bAq4pK8ZJEkLG+J7BO8GTk1yE3Bqty5JGshYLkNdVZcDl3e37wBOGcf7SpIW5jeLJalxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxhkCSGmcIJKlxvYUgyWFJrkzylSTXJ3lHt/3IJFuS3NQtj+hrBknSwvrcI7gPeH5VPQU4ATg9ybOADcDWqloDbO3WJUkD6S0ENXJPt/rQ7k8Ba4Hpbvs0cEZfM0iSFtbrOYIky5JcA+wEtlTVF4GVVbUdoFse1ecMkqQH1msIqur+qjoBOAZ4RpInLva5SdYlmUkyMzc319+QktS4sXxqqKruAi4HTgd2JFkF0C137uc5m6pqsqomJyYmxjGmJDWpz08NTSR5VHf74cALgK8Cm4Gp7mFTwCV9zSBJWtjyHl97FTCdZBmj4FxUVZcm+QJwUZKzgVuAl/c4gyRpAb2FoKquBU7cx/Y7gFP6el9J0oFZ8NBQkpVJPpjkX7r147t/zUuSDgGLOUfwIeAy4Ce79a8Bb+xrIEnSeC0mBCuq6iJgN0BV7QLu73UqSdLYLCYE9yZ5NKNvBdNdJuK7vU4lSRqbxZws/i1GH/n8mST/CkwAv9TrVJKksVkwBFX1pSS/ABwHBLixqn7Q+2SSpLFYMARJztxr0+OTfBfYVlX7/FawJOngsZhDQ2cDJwGf6dafC/w7oyC8s6ou6Gk2SdIYLCYEu4EnVNUOGH2vAPgA8EzgCsAQSNJBbDGfGlq9JwKdncDjq+pOwHMFknSQW8weweeSXAp8tFs/C7giyeHAXb1NJkkai8WE4HXAmcBzuvUrgVVVdS/wvL4GkySNx4KHhqqqgJsZHQZ6GaMLxt3Q81ySpDHZ7x5BkscDrwJeDdwBXAikqtwLkKRDyAMdGvoq8DngJVX1dYAkbxrLVJKksXmgQ0NnAbPAZ5L8TZJTGH2zWJJ0CNlvCKrq4qp6JfCzjH7f8JuAlUk+kOS0Mc0nSerZYk4W31tVH66qFwPHANcAG3qfTJI0Fgf0y+ur6s6q+uuqen5fA0mSxuuAQiBJOvQYAklqnCGQpMYZAklqnCGQpMYZAklqnCGQpMYZAklqnCGQpMYZAklqnCGQpMYZAklqXG8hSPLYJJ9JckOS65O8odt+ZJItSW7qlkf0NYMkaWF97hHsAn67qp4APAt4XZLjGV3CemtVrQG24iWtJWlQvYWgqrZX1Ze623cz+oX3jwHWAtPdw6aBM/qaQZK0sLGcI0iyGjgR+CKwsqq2wygWwFHjmEGStG+9hyDJI4B/At5YVd87gOetSzKTZGZubq6/ASWpcb2GIMlDGUXgw1X1z93mHUlWdfevAnbu67lVtamqJqtqcmJios8xJalpfX5qKMAHgRuq6n3z7toMTHW3p4BL+ppBkrSw5T2+9snArwLbklzTbfs94N3ARUnOBm4BXt7jDJKkBfQWgqr6PJD93H1KX+8rSTowfrNYkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcb2FIMl5SXYmuW7etiOTbElyU7c8oq/3lyQtTp97BB8CTt9r2wZga1WtAbZ265KkAfUWgqq6Arhzr81rgenu9jRwRl/vL0lanHGfI1hZVdsBuuVR+3tgknVJZpLMzM3NjW1ASWrNkj1ZXFWbqmqyqiYnJiaGHkeSDlnjDsGOJKsAuuXOMb+/JGkv4w7BZmCquz0FXDLm95ck7aXPj49+BPgCcFyS7yQ5G3g3cGqSm4BTu3VJ0oCW9/XCVfXq/dx1Sl/vKUk6cEv2ZLEkaTwMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuMGCUGS05PcmOTrSTYMMYMkaWTsIUiyDDgXeCFwPPDqJMePew5J0sgQewTPAL5eVd+oqu8D/wisHWAOSRLDhOAxwLfnrX+n2yZJGsDyAd4z+9hWP/KgZB2wrlu9J8mNvU7VlhXA7UMPsRTkj6eGHkE/zL+be7xtXz8qD9jjFvOgIULwHeCx89aPAW7b+0FVtQnYNK6hWpJkpqomh55D2pt/N4cxxKGhq4A1SX4qycOAVwGbB5hDksQAewRVtSvJ64HLgGXAeVV1/bjnkCSNDHFoiKr6OPDxId5bgIfctHT5d3MAqfqR87SSpIZ4iQlJapwhOMRk5PNJXjhv2yuSfGLIuaT5klSS985bf3OStw84UtMMwSGmRsf6Xgu8L8lhSQ4H/gh43bCTST/kPuDMJCuGHkSG4JBUVdcBHwN+B3gbcH5V3ZxkKsmVSa5J8pdJHpJkeZILkmxLcl2Sc4adXo3YxejE8Jv2viPJ45JsTXJttzx2/OO1ZZBPDWks3gF8Cfg+MJnkicDLgGd3H+HdxOg7HDcDK6rqSQBJHjXUwGrOucC1STbutf0vGP3jZTrJa4A/A84Y+3QNMQSHqKq6N8mFwD1VdV+SFwBPB2aSADyc0TWfLgOOS/KnjD7S+8mhZlZbqup7Sc4HzgH+a95dJwFndrcvAPYOhR5khuDQtrv7A6NrPJ1XVX+w94OSPJnRZcHPAc7i/67xJPXt/Yz2XP/uAR7jZ9x75jmCdnwKeMWek3NJHp3k2CQTjL5P8lFG5xOeOuSQaktV3QlcBJw9b/O/MTpsCfDLwOfHPVdr3CNoRFVtS/IO4FNJHgL8gNGni+4HPpjR8aJidIJZGqf3Aq+ft34OcF6StwBzwG8MMlVD/GaxJDXOQ0OS1DhDIEmNMwSS1DhDIEmNMwSS1DhDoGYl+f0k13fXtLkmyTN/zNd7bpJLH6z5pHHxewRqUpKTgBcDT+0uwbECeNiA8yyvql1Dvb/a5h6BWrUKuL2q7gOoqtur6rYk30zyriRfSDKT5KlJLktyc5LXwv/+zof3dFdr3ZbklXu/eJKnJ/lykp9OcniS85Jc1W1b2z3m15N8NMnH8BpPGpB7BGrVJ4G3Jvkao8tvXFhVn+3u+3ZVnZTkT4APAScDhwHXA3/F6IJoJwBPAVYAVyW5Ys8LJ3k28OfA2qq6Jcm7gE9X1Wu6q7temeRT3cNPAp7cXWpBGoQhUJOq6p4kTwN+DngecGGSDd3dm7vlNuARVXU3cHeS/+5+kD8H+EhV3Q/sSPJZRld2/R7wBEbX2T+tqm7rXuc04KVJ3tytHwbsucb+FiOgoRkCNav7QX45cHmSbcBUd9d93XL3vNt71pczupLr/mxn9IP+RGBPCAKcVVU3zn9gd3L63h/jP0F6UHiOQE1KclySNfM2nQB8a5FPvwJ4ZZJl3dVbfx64srvvLuBFwLuSPLfbdhnwm92F/Uhy4o87v/RgMgRq1SOA6ST/keRa4Hjg7Yt87sXAtcBXgE8D66tqds+dVbUDeAlwbvev/j8EHsrot3Fd161LS4ZXH5WkxrlHIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1Lj/AWsiiWHJqZ+iAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x='Smoker', y='Age', ci=95, data=data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La prochaine étape est donc d'étudier les données plus précisément, notamment suivant les tranches d'âges.\n",
"\n",
"## Décès liés au tabagisme suivant l'âge\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}