no commit message

parent 60cec46b
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Étude du Paradoxe de Simpson : Effet du Tabagisme sur la Survie des Femmes à Whickham"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"En 1972-1974, une enquête a été menée sur la santé des femmes à Whickham, en Angleterre. L'objectif était d'évaluer la relation entre le tabagisme et la survie à long terme. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme __fumant actuellement__ ou __n'ayant jamais fumé__. Nous allons analyser ces données pour explorer le Paradoxe de Simpson."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 1 : Préparation des Données"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import statsmodels.api as sm\n",
"import statsmodels.formula.api as smf"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Noms des colonnes dans le DataFrame : Index(['Smoker', 'Status', 'Age'], dtype='object')\n"
]
}
],
"source": [
"# Chargement des données\n",
"url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\"\n",
"data = pd.read_csv(url)\n",
"\n",
"# Exploration des données\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 2 : Analyse du Statut de Tabagisme et de la Survie"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table de survie par statut de tabagisme :\n",
"Status Alive Dead total Taux de mortalité\n",
"Smoker \n",
"No 502 230 732 0.314208\n",
"Yes 443 139 582 0.238832\n"
]
}
],
"source": [
"# Création de la table de survie en utilisant les colonnes disponibles\n",
"table_smoking = data.groupby(['Smoker', 'Status']).size().unstack(fill_value=0)\n",
"table_smoking['total'] = table_smoking.sum(axis=1)\n",
"table_smoking['Taux de mortalité'] = table_smoking['Dead'] / table_smoking['total']\n",
"\n",
"print('Table de survie par statut de tabagisme :')\n",
"print(table_smoking)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualisation des taux de mortalité selon le statut de tabagisme\n",
"sns.barplot(x=table_smoking.index, y=table_smoking['Taux de mortalité'])\n",
"plt.title('Taux de mortalité selon le statut de tabagisme')\n",
"plt.ylabel('Taux de mortalité')\n",
"plt.xlabel('Statut de tabagisme')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"La cigarette est souvent blâmée pour sa dangerosité. Cependant, d'après les résultats, les fumeurs semblent vivre plus longtemps. Ce résultat est surprenant, c'est pour ça qu'il est nommé « paradoxe », le paradoxe de Simpson."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 3 : Analyse par Catégories d'Âge"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table de survie par âge et statut de tabagisme :\n",
"Status Alive Dead total Taux de mortalité\n",
"GroupeAge Smoker \n",
"18-34 No 212 6 218 0.027523\n",
" Yes 172 5 177 0.028249\n",
"35-54 No 180 19 199 0.095477\n",
" Yes 196 41 237 0.172996\n",
"55-64 No 81 40 121 0.330579\n",
" Yes 64 51 115 0.443478\n",
"65+ No 28 165 193 0.854922\n",
" Yes 7 42 49 0.857143\n"
]
}
],
"source": [
"# Définition des classes d'âge\n",
"bins = [18, 34, 54, 64, np.inf]\n",
"labels = ['18-34', '35-54', '55-64', '65+']\n",
"data['GroupeAge'] = pd.cut(data['Age'], bins=bins, labels=labels)\n",
"\n",
"# Table de survie par âge et statut de tabagisme\n",
"table_smoking = data.groupby(['GroupeAge','Smoker', 'Status']).size().unstack(fill_value=0)\n",
"table_smoking['total'] = table_smoking.sum(axis=1)\n",
"table_smoking['Taux de mortalité'] = table_smoking['Dead'] / table_smoking['total']\n",
"\n",
"print('Table de survie par âge et statut de tabagisme :')\n",
"print(table_smoking)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualisation des taux de mortalité selon le statut de tabagisme et par l'age\n",
"table_smoking_reset = table_smoking.reset_index()\n",
"plt.figure(figsize=(12, 8))\n",
"sns.barplot(data=table_smoking_reset, x='GroupeAge', y='Taux de mortalité', hue='Smoker')\n",
"plt.title('Taux de mortalité selon le statut de tabagisme par groupe d\\'âge', fontsize=16)\n",
"plt.ylabel('Taux de mortalité', fontsize=12)\n",
"plt.xlabel('Groupe d\\'âge', fontsize=12)\n",
"plt.legend(title='Statut de tabagisme')\n",
"plt.xticks(rotation=45)\n",
"plt.ylim(0, 1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"Instinctivement, on pourrait s'attendre à ce que le tabagisme entraîne un risque de mortalité plus élevé, mais ce n'est pas forcément le cas ici. Ce paradoxe vient du fait que l'on n'a pas le contrôle absolu des personnes observées. En effet, il est possible que les non-fumeurs dans l'ensemble des données soient plus âgés en moyenne que les fumeurs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 4 : Régression Logistique"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Création de la variable 'Death' pour indiquer si l'individu est décédé durant la période de 20 ans\n",
"data['Death'] = data['survived'].apply(lambda x: 0 if x == 'alive' else 1)\n",
"\n",
"# Modèle de régression logistique pour les fumeuses\n",
"model_smokers = smf.logit('Death ~ age', data=data[data['smoking_status'] == 'smoker']).fit()\n",
"print(model_smokers.summary())\n",
"\n",
"# Modèle de régression logistique pour les non-fumeuses\n",
"model_non_smokers = smf.logit('Death ~ age', data=data[data['smoking_status'] == 'non-smoker']).fit()\n",
"print(model_non_smokers.summary())\n",
"\n",
"# Tracer les courbes de probabilité de décès en fonction de l'âge pour chaque groupe\n",
"age_range = np.linspace(data['age'].min(), data['age'].max(), 100)\n",
"death_prob_smokers = model_smokers.predict(pd.DataFrame({'age': age_range}))\n",
"death_prob_non_smokers = model_non_smokers.predict(pd.DataFrame({'age': age_range}))\n",
"\n",
"plt.plot(age_range, death_prob_smokers, label='Fumeuses', color='red')\n",
"plt.plot(age_range, death_prob_non_smokers, label='Non-Fumeuses', color='blue')\n",
"plt.fill_between(age_range,\n",
" death_prob_smokers - 1.96 * np.sqrt(death_prob_smokers * (1 - death_prob_smokers) / len(data[data['smoking_status'] == 'smoker'])),\n",
" death_prob_smokers + 1.96 * np.sqrt(death_prob_smokers * (1 - death_prob_smokers) / len(data[data['smoking_status'] == 'smoker'])),\n",
" color='red', alpha=0.3)\n",
"plt.fill_between(age_range,\n",
" death_prob_non_smokers - 1.96 * np.sqrt(death_prob_non_smokers * (1 - death_prob_non_smokers) / len(data[data['smoking_status'] == 'non-smoker'])),\n",
" death_prob_non_smokers + 1.96 * np.sqrt(death_prob_non_smokers * (1 - death_prob_non_smokers) / len(data[data['smoking_status'] == 'non-smoker'])),\n",
" color='blue', alpha=0.3)\n",
"plt.xlabel('Âge')\n",
"plt.ylabel('Probabilité de décès')\n",
"plt.legend()\n",
"plt.title('Probabilité de décès en fonction de l\\'âge et du statut de tabagisme')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"Le Paradoxe de Simpson apparaît ici car les taux de mortalité semblent diverger en fonction du tabagisme dans les groupes d'âge, suggérant une conclusion différente lorsque l'on analyse toutes les femmes en tant que groupe unique comparé à une analyse par tranche d'âge."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Étude du Paradoxe de Simpson : Effet du Tabagisme sur la Survie des Femmes à Whickham"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"En 1972-1974, une enquête a été menée sur la santé des femmes à Whickham, en Angleterre. L'objectif était d'évaluer la relation entre le tabagisme et la survie à long terme. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme __fumant actuellement__ ou __n'ayant jamais fumé__. Nous allons analyser ces données pour explorer le Paradoxe de Simpson."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 1 : Préparation des Données"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import statsmodels.api as sm\n",
"import statsmodels.formula.api as smf"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>21.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>19.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>47.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>81.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>23.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>66.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>60.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>43.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>27.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>65.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>73.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>33.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>18.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>56.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>59.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>36.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1284</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1285</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>48.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1286</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1287</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>60.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1288</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>39.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1289</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1290</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1291</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1292</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>57.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1293</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1294</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>46.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1295</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>82.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1296</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1297</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>32.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1298</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1299</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1300</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1301</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1302</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>44.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1303</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>31.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1304</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>47.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1305</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>60.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1306</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>61.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1307</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1308</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>42.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1309</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1310</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>22.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1311</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>88.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1314 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4\n",
"5 No Alive 36.8\n",
"6 No Alive 23.8\n",
"7 Yes Dead 57.5\n",
"8 Yes Alive 24.8\n",
"9 Yes Alive 49.5\n",
"10 Yes Alive 30.0\n",
"11 No Dead 66.0\n",
"12 Yes Alive 49.2\n",
"13 No Alive 58.4\n",
"14 No Dead 60.6\n",
"15 No Alive 25.1\n",
"16 No Alive 43.5\n",
"17 No Alive 27.1\n",
"18 No Alive 58.3\n",
"19 Yes Alive 65.7\n",
"20 No Dead 73.2\n",
"21 Yes Alive 38.3\n",
"22 No Alive 33.4\n",
"23 Yes Dead 62.3\n",
"24 No Alive 18.0\n",
"25 No Alive 56.2\n",
"26 Yes Alive 59.2\n",
"27 No Alive 25.8\n",
"28 No Dead 36.9\n",
"29 No Alive 20.2\n",
"... ... ... ...\n",
"1284 Yes Dead 36.0\n",
"1285 Yes Alive 48.3\n",
"1286 No Alive 63.1\n",
"1287 No Alive 60.8\n",
"1288 Yes Dead 39.3\n",
"1289 No Alive 36.7\n",
"1290 No Alive 63.8\n",
"1291 No Dead 71.3\n",
"1292 No Alive 57.7\n",
"1293 No Alive 63.2\n",
"1294 No Alive 46.6\n",
"1295 Yes Dead 82.4\n",
"1296 Yes Alive 38.3\n",
"1297 Yes Alive 32.7\n",
"1298 No Alive 39.7\n",
"1299 Yes Dead 60.0\n",
"1300 No Dead 71.0\n",
"1301 No Alive 20.5\n",
"1302 No Alive 44.4\n",
"1303 Yes Alive 31.2\n",
"1304 Yes Alive 47.8\n",
"1305 Yes Alive 60.9\n",
"1306 No Dead 61.4\n",
"1307 Yes Alive 43.0\n",
"1308 No Alive 42.1\n",
"1309 Yes Alive 35.9\n",
"1310 No Alive 22.3\n",
"1311 Yes Dead 62.1\n",
"1312 No Dead 88.6\n",
"1313 No Alive 39.1\n",
"\n",
"[1314 rows x 3 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Chargement des données\n",
"url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\"\n",
"data = pd.read_csv(url)\n",
"\n",
"# Exploration des données\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 2 : Analyse du Statut de Tabagisme et de la Survie"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table de survie par statut de tabagisme :\n",
"Status Alive Dead total Taux de mortalité\n",
"Smoker \n",
"No 502 230 732 0.314208\n",
"Yes 443 139 582 0.238832\n"
]
}
],
"source": [
"# Création de la table de survie en utilisant les colonnes disponibles\n",
"table_smoking = data.groupby(['Smoker', 'Status']).size().unstack(fill_value=0)\n",
"table_smoking['total'] = table_smoking.sum(axis=1)\n",
"table_smoking['Taux de mortalité'] = table_smoking['Dead'] / table_smoking['total']\n",
"\n",
"print('Table de survie par statut de tabagisme :')\n",
"print(table_smoking)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualisation des taux de mortalité selon le statut de tabagisme\n",
"sns.barplot(x=table_smoking.index, y=table_smoking['Taux de mortalité'])\n",
"plt.title('Taux de mortalité selon le statut de tabagisme')\n",
"plt.ylabel('Taux de mortalité')\n",
"plt.xlabel('Statut de tabagisme')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"La cigarette est souvent blâmée pour sa dangerosité. Instinctivement, on pourrait s'attendre à ce que le tabagisme entraîne un risque de mortalité plus élevé, mais ce n'est pas forcément le cas ici. Cependant, d'après les résultats, les fumeurs semblent vivre plus longtemps. Ce résultat est surprenant, c'est pour ça qu'il est nommé « paradoxe », le paradoxe de Simpson. Ce paradoxe vient du fait que l'on n'a pas le contrôle absolu des personnes observées. En effet, il est possible que les non-fumeurs dans l'ensemble des données soient plus âgés en moyenne que les fumeurs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 3 : Analyse par Catégories d'Âge"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table de survie par âge et statut de tabagisme :\n",
"Status Alive Dead total Taux de mortalité\n",
"GroupeAge Smoker \n",
"18-34 No 212 6 218 0.027523\n",
" Yes 172 5 177 0.028249\n",
"35-54 No 180 19 199 0.095477\n",
" Yes 196 41 237 0.172996\n",
"55-64 No 81 40 121 0.330579\n",
" Yes 64 51 115 0.443478\n",
"65+ No 28 165 193 0.854922\n",
" Yes 7 42 49 0.857143\n"
]
}
],
"source": [
"# Définition des classes d'âge\n",
"bins = [18, 34, 54, 64, np.inf]\n",
"labels = ['18-34', '35-54', '55-64', '65+']\n",
"data['GroupeAge'] = pd.cut(data['Age'], bins=bins, labels=labels)\n",
"\n",
"# Table de survie par âge et statut de tabagisme\n",
"table_smoking = data.groupby(['GroupeAge','Smoker', 'Status']).size().unstack(fill_value=0)\n",
"table_smoking['total'] = table_smoking.sum(axis=1)\n",
"table_smoking['Taux de mortalité'] = table_smoking['Dead'] / table_smoking['total']\n",
"\n",
"print('Table de survie par âge et statut de tabagisme :')\n",
"print(table_smoking)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualisation des taux de mortalité selon le statut de tabagisme et par l'age\n",
"table_smoking_reset = table_smoking.reset_index()\n",
"plt.figure(figsize=(12, 8))\n",
"sns.barplot(data=table_smoking_reset, x='GroupeAge', y='Taux de mortalité', hue='Smoker')\n",
"plt.title('Taux de mortalité selon le statut de tabagisme par groupe d\\'âge', fontsize=16)\n",
"plt.ylabel('Taux de mortalité', fontsize=12)\n",
"plt.xlabel('Groupe d\\'âge', fontsize=12)\n",
"plt.legend(title='Statut de tabagisme')\n",
"plt.xticks(rotation=45)\n",
"plt.ylim(0, 1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"Les taux de mortalité augmentent généralement avec l'âge, mais il est intéressant de noter que les fumeuses ont un taux de mortalité plus élevé à chaque classe d'âge, ce qui est un indicateur de l'impact du tabagisme sur la santé. Ce phénomène peut être expliqué par les effets à long terme du tabagisme sur des maladies telles que le cancer, les maladies cardiovasculaires, et les maladies pulmonaires. Ce qui est étrange ici c'est que le taux de mortalité est similaire pour les femmes de +65 ans.\n",
"\n",
"Ce paradoxe peut être expliqué simplement : Les fumeuses qui atteignent 85 ans sont donc une population sélectionnée, ayant survécu aux effets du tabac, tandis que les non-fumeuses à cet âge ont généralement une meilleure espérance de vie, malgré un nombre absolu de décès plus élevé. Ainsi, le taux de mortalité reste similaire en raison de la taille relative des groupes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 4 : Régression Logistique"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.412727\n",
" Iterations 7\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Death No. Observations: 582\n",
"Model: Logit Df Residuals: 580\n",
"Method: MLE Df Model: 1\n",
"Date: Mon, 11 Nov 2024 Pseudo R-squ.: 0.2492\n",
"Time: 18:00:48 Log-Likelihood: -240.21\n",
"converged: True LL-Null: -319.94\n",
" LLR p-value: 1.477e-36\n",
"==============================================================================\n",
" coef std err z P>|z| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept -5.5081 0.466 -11.814 0.000 -6.422 -4.594\n",
"Age 0.0890 0.009 10.203 0.000 0.072 0.106\n",
"==============================================================================\n",
"Optimization terminated successfully.\n",
" Current function value: 0.354560\n",
" Iterations 7\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Death No. Observations: 732\n",
"Model: Logit Df Residuals: 730\n",
"Method: MLE Df Model: 1\n",
"Date: Mon, 11 Nov 2024 Pseudo R-squ.: 0.4304\n",
"Time: 18:00:48 Log-Likelihood: -259.54\n",
"converged: True LL-Null: -455.62\n",
" LLR p-value: 2.808e-87\n",
"==============================================================================\n",
" coef std err z P>|z| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept -6.7955 0.479 -14.174 0.000 -7.735 -5.856\n",
"Age 0.1073 0.008 13.742 0.000 0.092 0.123\n",
"==============================================================================\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Création de la variable 'Death' pour indiquer si l'individu est décédé durant la période de 20 ans\n",
"data['Death'] = data['Status'].apply(lambda x: 0 if x == 'Alive' else 1)\n",
"\n",
"# Modèles:\n",
"#fumeuses\n",
"model_smokers = smf.logit('Death ~ Age', data=data[data['Smoker'] == 'Yes']).fit()\n",
"print(model_smokers.summary())\n",
"\n",
"#non-fumeuses\n",
"model_non_smokers = smf.logit('Death ~ Age', data=data[data['Smoker'] == 'No']).fit()\n",
"print(model_non_smokers.summary())\n",
"\n",
"# Tracer les courbes de probabilité de décès en fonction de l'âge pour chaque groupe\n",
"age_range = np.linspace(data['Age'].min(), data['Age'].max(), 100)\n",
"death_prob_smokers = model_smokers.predict(pd.DataFrame({'Age': age_range}))\n",
"death_prob_non_smokers = model_non_smokers.predict(pd.DataFrame({'Age': age_range}))\n",
"\n",
"plt.plot(age_range, death_prob_smokers, label='Fumeuses', color='red')\n",
"plt.plot(age_range, death_prob_non_smokers, label='Non-Fumeuses', color='blue')\n",
"\n",
"# Calcul des intervalles de confiance à 95%\n",
"ci_smokers = 1.96 * np.sqrt(death_prob_smokers * (1 - death_prob_smokers) / len(data[data['Smoker'] == 'Yes']))\n",
"ci_non_smokers = 1.96 * np.sqrt(death_prob_non_smokers * (1 - death_prob_non_smokers) / len(data[data['Smoker'] == 'No']))\n",
"\n",
"# Tracer les intervalles de confiance\n",
"plt.fill_between(age_range,\n",
" death_prob_smokers - ci_smokers,\n",
" death_prob_smokers + ci_smokers,\n",
" color='red', alpha=0.3)\n",
"plt.fill_between(age_range,\n",
" death_prob_non_smokers - ci_non_smokers,\n",
" death_prob_non_smokers + ci_non_smokers,\n",
" color='blue', alpha=0.3)\n",
"\n",
"plt.xlabel('Âge')\n",
"plt.ylabel('Probabilité de décès')\n",
"plt.legend()\n",
"plt.title('Probabilité de décès en fonction de l\\'âge et du statut de tabagisme')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"Les régressions montrent une différence de probabilité de décès entre les fumeuses et les non-fumeuses en fonction de l'âge. Bien que ces modèles indiquent une tendance générale, ils ne peuvent pas prouver de manière définitive que le tabagisme est la seule cause de ces décès. En effet, d'autres facteurs (comme l'alimentation, l'activité physique, etc.) peuvent influencer la santé. Cependant, si on observe que les fumeuses ont une probabilité plus élevée de décéder à des âges plus jeunes, cela suggère clairement que le tabagisme a un impact négatif sur la santé. C'est un indice fort de la nocivité du tabac, même si ce n'est pas une preuve absolue."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"Le Paradoxe de Simpson apparaît ici car les taux de mortalité semblent diverger en fonction du tabagisme dans les groupes d'âge, suggérant une conclusion différente lorsque l'on analyse toutes les femmes en tant que groupe unique comparé à une analyse par tranche d'âge."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Étude du Paradoxe de Simpson : Effet du Tabagisme sur la Survie des Femmes à Whickham"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"En 1972-1974, une enquête a été menée sur la santé des femmes à Whickham, en Angleterre. L'objectif était d'évaluer la relation entre le tabagisme et la survie à long terme. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme __fumant actuellement__ ou __n'ayant jamais fumé__. Nous allons analyser ces données pour explorer le Paradoxe de Simpson."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 1 : Préparation des Données"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import statsmodels.api as sm\n",
"import statsmodels.formula.api as smf"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>21.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>19.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>47.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>81.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>23.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>66.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>60.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>43.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>27.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>65.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>73.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>33.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>18.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>56.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>59.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>36.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1284</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1285</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>48.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1286</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1287</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>60.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1288</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>39.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1289</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1290</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1291</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1292</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>57.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1293</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1294</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>46.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1295</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>82.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1296</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1297</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>32.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1298</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1299</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1300</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1301</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1302</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>44.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1303</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>31.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1304</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>47.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1305</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>60.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1306</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>61.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1307</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1308</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>42.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1309</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1310</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>22.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1311</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>88.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1314 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4\n",
"5 No Alive 36.8\n",
"6 No Alive 23.8\n",
"7 Yes Dead 57.5\n",
"8 Yes Alive 24.8\n",
"9 Yes Alive 49.5\n",
"10 Yes Alive 30.0\n",
"11 No Dead 66.0\n",
"12 Yes Alive 49.2\n",
"13 No Alive 58.4\n",
"14 No Dead 60.6\n",
"15 No Alive 25.1\n",
"16 No Alive 43.5\n",
"17 No Alive 27.1\n",
"18 No Alive 58.3\n",
"19 Yes Alive 65.7\n",
"20 No Dead 73.2\n",
"21 Yes Alive 38.3\n",
"22 No Alive 33.4\n",
"23 Yes Dead 62.3\n",
"24 No Alive 18.0\n",
"25 No Alive 56.2\n",
"26 Yes Alive 59.2\n",
"27 No Alive 25.8\n",
"28 No Dead 36.9\n",
"29 No Alive 20.2\n",
"... ... ... ...\n",
"1284 Yes Dead 36.0\n",
"1285 Yes Alive 48.3\n",
"1286 No Alive 63.1\n",
"1287 No Alive 60.8\n",
"1288 Yes Dead 39.3\n",
"1289 No Alive 36.7\n",
"1290 No Alive 63.8\n",
"1291 No Dead 71.3\n",
"1292 No Alive 57.7\n",
"1293 No Alive 63.2\n",
"1294 No Alive 46.6\n",
"1295 Yes Dead 82.4\n",
"1296 Yes Alive 38.3\n",
"1297 Yes Alive 32.7\n",
"1298 No Alive 39.7\n",
"1299 Yes Dead 60.0\n",
"1300 No Dead 71.0\n",
"1301 No Alive 20.5\n",
"1302 No Alive 44.4\n",
"1303 Yes Alive 31.2\n",
"1304 Yes Alive 47.8\n",
"1305 Yes Alive 60.9\n",
"1306 No Dead 61.4\n",
"1307 Yes Alive 43.0\n",
"1308 No Alive 42.1\n",
"1309 Yes Alive 35.9\n",
"1310 No Alive 22.3\n",
"1311 Yes Dead 62.1\n",
"1312 No Dead 88.6\n",
"1313 No Alive 39.1\n",
"\n",
"[1314 rows x 3 columns]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Chargement des données\n",
"url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\"\n",
"data = pd.read_csv(url)\n",
"\n",
"# Exploration des données\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 2 : Analyse du Statut de Tabagisme et de la Survie"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table de survie par statut de tabagisme :\n",
"Status Alive Dead total Taux de mortalité\n",
"Smoker \n",
"No 502 230 732 0.314208\n",
"Yes 443 139 582 0.238832\n"
]
}
],
"source": [
"# Création de la table de survie en utilisant les colonnes disponibles\n",
"table_smoking = data.groupby(['Smoker', 'Status']).size().unstack(fill_value=0)\n",
"table_smoking['total'] = table_smoking.sum(axis=1)\n",
"table_smoking['Taux de mortalité'] = table_smoking['Dead'] / table_smoking['total']\n",
"\n",
"print('Table de survie par statut de tabagisme :')\n",
"print(table_smoking)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualisation des taux de mortalité selon le statut de tabagisme\n",
"sns.barplot(x=table_smoking.index, y=table_smoking['Taux de mortalité'])\n",
"plt.title('Taux de mortalité selon le statut de tabagisme')\n",
"plt.ylabel('Taux de mortalité')\n",
"plt.xlabel('Statut de tabagisme')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"La cigarette est souvent blâmée pour sa dangerosité. Instinctivement, on pourrait s'attendre à ce que le tabagisme entraîne un risque de mortalité plus élevé, mais ce n'est pas forcément le cas ici. Cependant, d'après les résultats, les fumeurs semblent vivre plus longtemps. Ce résultat est surprenant, c'est pour ça qu'il est nommé « paradoxe », le paradoxe de Simpson. Ce paradoxe vient du fait que l'on n'a pas le contrôle absolu des personnes observées. En effet, il est possible que les non-fumeurs dans l'ensemble des données soient plus âgés en moyenne que les fumeurs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 3 : Analyse par Catégories d'Âge"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table de survie par âge et statut de tabagisme :\n",
"Status Alive Dead total Taux de mortalité\n",
"GroupeAge Smoker \n",
"18-34 No 212 6 218 0.027523\n",
" Yes 172 5 177 0.028249\n",
"35-54 No 180 19 199 0.095477\n",
" Yes 196 41 237 0.172996\n",
"55-64 No 81 40 121 0.330579\n",
" Yes 64 51 115 0.443478\n",
"65+ No 28 165 193 0.854922\n",
" Yes 7 42 49 0.857143\n"
]
}
],
"source": [
"# Définition des classes d'âge\n",
"bins = [18, 34, 54, 64, np.inf]\n",
"labels = ['18-34', '35-54', '55-64', '65+']\n",
"data['GroupeAge'] = pd.cut(data['Age'], bins=bins, labels=labels)\n",
"\n",
"# Table de survie par âge et statut de tabagisme\n",
"table_smoking = data.groupby(['GroupeAge','Smoker', 'Status']).size().unstack(fill_value=0)\n",
"table_smoking['total'] = table_smoking.sum(axis=1)\n",
"table_smoking['Taux de mortalité'] = table_smoking['Dead'] / table_smoking['total']\n",
"\n",
"print('Table de survie par âge et statut de tabagisme :')\n",
"print(table_smoking)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAtcAAAIHCAYAAABUsHByAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzs3XvcZXPd//HXx4zzoZwSBsmtMGYMxiDcaORUUndE5JA0qSR3pciNyl1x6yBRbpUQmpJDKuEnqVsooxjGIecxETOTZIzJjPn8/viu67Jnz76ua18z6zqZ1/PxuB7XXmuvvdZnr7322u/9Xd+1dmQmkiRJkhbfUgNdgCRJkvRqYbiWJEmSamK4liRJkmpiuJYkSZJqYriWJEmSamK4liRJkmpiuJYEQESsGBEPRMR3B7oWSVrSRcQhEZER8cGBrkW9Y7heAlRvzp7+HhvoOtsREZtU9R440LXUISL2i4hjFuPxR1Xr4/UN4/4WEec2DO8WESe3MbtzgCeAjyxqPb0VEctV9R/fX8uslrvQeuvj5R0ZEYcuxuMXdztZIyI+HxGjF3Ue1XwW2LYWc17tbpddPX7P6jXcsY56erns2taDliwRcVpEzGkY7tgHHtg03WrA6ZT98X9HxJr9XKoWw/CBLkD9Yvum4SuBu4DPN4z7V79Vo0b7AWOBs2qc597Asw3DuwGfBr7Y1QMi4iBgO2C7zJxXYy0qjgRmARct4uMXdztZAzgFeAiYvIjzqFuP2+Ug1vwek+r2P8B3M/PciFgV+DpwyADXpDYZrpcAmXlb43BE/AuY0Txe/Scils3MPvlCk5l/WoTHXApc2gflSK86i/IeezXrq/1ZX+4nB7vMPLLh9lcGshb1nt1CtICI2D4iroyIaRHxYkTcHxFfiIhlm6Zb6LBo8yH+iFglIh6OiJsjYljDdPu0048sIlaKiPMi4u8R8XxEXAG0PIxfHWK+KSJmVX+/jIhN23i+EyPioep5/6F6zvdGxO5RfDYipkbEcxFxeUSs3vT410bEd6r18VK1vo5umqbj8PU+EXFBRMwEHo+IicABwEYN3XPurx6zYkScVdXyQkQ8GRFXRcTGbTynztcmIk4DPgsMa1hG4yHJlSPiaxHxeFX/wxHxmYiIHpaxdER8JSIeiYg5ETE9Iv4vIrZtmCYi4qMRcXc1zTMR8b8R8Zo2nsM+EfHH6vV4tlr3GzVNc1tE3BARe0XEnRExu1rW23uafxfLXJx6D4+Iu6rX6rnq9hEddQLbAuMbXoNrq/vWjojvRsSDVf1TI+KiWLCbT3fbScvuLdFw6DkiNgHuq+76YcM8uu1aFRGfruqZU703tutiun+L8j6aUU17R0+vQRvb5Veq1/Sf1bZ1Q0SM7WJ2q0bExdV6fy4iLoyI1zYt7z+r7eXZ6u/3EbF7i7reFBHXVdvd36r1eHTzOo6Fu16tGxGXRMRTEfGvKO/Xq6O0ODZ2ZzsiIs6otq1/RsQPouw3N6me46xqW1jotYmIrSPiFxHxj6q+30VE81HJlus6IuZFxKjqMS9GxF8j4qSIV97n0eY+p2Gb6/iseA74bQ81HBoRf6m2j7uivGdv63gfVNO03E823N/OPqHHz6XerJNq2rWivEc7Xtt7I+IDPa336rHjIuKW6nk/EW12f4v2P4eHR8TpEfF09ZpdHxGbNz/fatpF2n60aGy5VrM3ALcD36ccxh4FnAxsABzemxll5j+jdDe4GTgJ+HxErAv8ALgsM7/fwyx+AOxDOZz9Z2AvWhxWj4j/AC6jdHc5CBgGnAD8LiJGZ+ZTPSxndcrzPR14mtJd5krgu8D6wFHAusCZwDeAQ6vlDgeuAzYD/gu4H9gX+FZErJaZzYe7zwV+DrwPWA64t1r2JsD+1TQvVv9XAJatanmaclj/aODWiHhzZs7s4Tl1OAdYh7JeOvqmzq/qXwa4AdgQOJUSwHYA/ht4DXBiN/M9GfgYZT3fU00/DlitYZpvAB+t/v8aWA/4ErBZROycmfNbzTgi9qWs/2uB91bz/m/g5ojYIjOfaZh8U8rh069QDtN/FrgiIt6UmY/TO4ta73jgfMph209S9qubAR0B74PAj4HZQEe/6X9U/9cAnq/qngGMAI6jbLsjM3MuZdvqajtpx2PAgcBEyvZ0XTX+wa4eEBEfA86gvAcur5Z9GbBS03RvBP5A6at/DDATeD9wdUTsnZnX0VqX22Xl9dXy/wqsTNn33BwRYzLz/qZ5fRv4FWVb2ZTymq0F7NkwzQbA/1LC2jLAu4FrI2J8Zv6mei7LU94PCUygbE8fprxfezKR8hp9sqr59cDbKO/zRqcA/4+yjragbLdJ+fJ1DmVbPga4JCLuyMwHq9q2A34D3EbZnuZQ9gc3RsS4zLy7h/oC+Fm1Dv4beAelO85c4LRqmt7uc34MXAycTdnntl5wxDuAC4GfAsdSXpvvVOvmzhYPad5P9naf0K4e10mUL0e3VtP/FzAVeDvw/YgYnpldnvwd5cvYDZRt7hDgZeB4YO3G6TJzTlVLozfQ3ufwacB/Vv9vouyDr2pRy+JuP+qtzPRvCfujfNhe3MZ0QQkKRwLzgJUb7vsbcG7T9MtRPiiObxp/QvX4nYEbKTub1/aw7NHVvI5tGv+DavyB1fBSlA/2a5qmW40SYE7rYTkTq/mNaxg3rho3GViqYfy3gRcbhvdrrKVh/MWUIPWaanjParofdbH8h9p4LYZRQsYc4CMN44+q5v36rl4byo53Xot5fogSaLZtGn8qJbx1+RpRPjQu7eb+N1Xz/kzT+PFVvXt2tc1QwvqUpnX/ZsqH05cbxt1GOVdgg4ZxI6r5fbKH9bnAemu33i7m9V/Akz0s7zbghjZe5+HAxtUy9+ppO2n1+je85nMahjeppnt/GzUsXW1DVzWNP6yaR+O2dQnwZMe23jD+d8BtPSyn5XbZxba/NGW/dXrD+I73VXOdH6zG79DF/Jaq1vPvgB83jD+metwWTdPe3917jLKffAmY0M1z6Fj/zfupa6rx+zWMe1017rMN435POU9meNPr9DAwsY313Gpf+kPKF4gVu1nv3e1zvtLTa1dN/yfgjqZxb6nmcW2L17PVfrLdfUJbn0vtrhPKF7UXgDe0mO7Jxnpa1Py1at01bjevoXwuzenqcS3m0/JzuNpOXgS+3jT951o830XefvxbtD+7hWgBEbFqlG4Cj1CCy1xK69UwYKNuH9y10ymHDa8D/h04ODP/0f1D6DgE/ZOm8RObhkdSAtXF1SGy4VWL8j8p3/z/vY36/p6Zf2wY7mgZ+3+5YGvl/cByEbFGNfzvlA/Vy5rmdzGwPCWkN7qyjVo6RcTBEXF7ddh1HuU5LUv5UKnDnsBfgDua1t31lA+k5vob3Q68KyK+GBFviYilm+7fg/KhcEnTvH9H2a5avi5RzpAfSfmA7Vz3mflAtcydmx4yJRtaqDNzGuXDa/2ennwd9Vb+CKxdHcreOyJWaXehURwTpSvKLMr77S/V3XW9zr21IaV1sfm992PKh3ajPSmtjC+02Ia2iYjmltu2VF0Efld1DZhHeZ9tQOt10tU+ovOQd0RsGxG/iohnKIFsLrBT0/y2A/6SmXd1jKi2wSu6qzVLUrkD+FyULiQju5n8V03DHfuazhb+LK2wz1KOnFBtT9tT1j8N6zgpjRXt7OOg9Xp6LaW1n2revdnn9Lg/q7oxjKG0WnfKzFuAro4oLjDfRdgn9EZP62RPypHXaU3b93WUFuh/62be2wO/y8y/NdT8HAtvAwtp83N4DGU/3fz5s8C6rnH7US8YrtXsYuADlEPjuwHbUA51wsKHONtS7RAvpuykJ2XmzW08rOPQ2dNN45uHX1f9v4SyA2r8241yqLYnzWf9v9TD+I71sBrwTGa+3DTd3xrub9RT95ROEbE/ZZ3dSTmkvy3ltXiORXwdWngd5UOzeb39rrq/u3X3eUqrzn6UVpEZVb/EVRvmDTCtad4vUbaDrubdsc5arau/sfA6/XuL6f5F79fRotZLZl5POYS9EXA1MDNKv93uQlaHT1O6G/2S0lVhHK+Ehbpe595q+d7Lcvj6nx3DUc6jWI3ShaJ5GzqV8vmyKr1U9QP9BaWLyQcooXcbqi+3LR7SXOcLlNbGdav5vZFypGUFSref7av53dg0v7WBVt0Lmvc5rbyb0mXhROCeqq/sCRELnbvQap/ycmY+32J8R21rUr74fYmF1/ORtLePa/U8OoY71lNv9znt7M9eX9Xem/XaPN/e7hN6o9t1Qtkv7M7C6/2H1f3drfu1W8y/1TJbaedzuON92rxum+df1/ajXrDPtTpFxMqUfs2fycxvNYzfpsXkcyh9Fxu1fJNGxAjgq5TWnW0j4sOZ+b89lNOxI12LcviNhuFGHf0AP8UrobC5zr7yd2DNiFiqqYW748Sn5j6Kza1+3TmQ0ir7oY4REbEC5bBiXWYCD1D6f7bySFcPzHIG/5eAL0XE2sA7KYdBl6F0H+h47rtQgk6z6V3MuiMstzpx9fUsvE7rsqj1ApCZE4GJ1XvorZS+s7+k9J3szoGUrgKNJ1v1eCJug47tu633Ypsa33udqlbozlb5zHy5auH8JSUEtDJjEZa/H6Wf6X6NX1yrFsxW/eib61wRWJHS9xlKH9mVgPdk5oyG6RboP0553lv3NP9WqtbJo4CjImIzSjD6MiX8/aCnx/eg4z3xNRY+cgft71e62pd2rKfe7nPaWe7T1XSva3HfWrQOms3z7c0+oe3PpYYaulsnMymXrzyui8c39/9v9BStt51ut6defA53vE9fR+ne0dX869p+1Au2XKvRCpRvuHM7RlQtL4e1mPZxYPOmcQtdISAilqJ8C3+OEjq+A3y9jQDRcZnA9zaNbz6L/m7KznHTzJzU4u+eHpazOH5LadV8d9P4gyl94W5vYx7/onQhabYCDa9D5fBe1te4jGEtum5cSwl/z3ax7lq1Ci8kM5+qviz9jle2iespO+0RXcy75cmG1TLvAd7b2OoX5YoFY+nhqgSLYZHqbZaZz2fmzygnIm3Q0EWkN69zqysRdPX4jro634vVofjxLR5PF/No9igl9DS/9w5g4ROvrqWcmHd3F+ut+bk119Rqu1yB0iWh80M/IvamdUCjRZ0d+4iOE9FWqP53Xr89IjanbE+NbgPeFBFbNEy3FPAf3TyHhWTmvZl5HOW8i+Z9ZK9l5rOUk0ZHU/ouN6/jO9qcVav19A9euZJMnfscoPNox52UL0ydIuItNJ3Y1808erNPaOtzqUFP6+RaSheRR7rYvmd1M+9bgZ1iwavMvIYSnLvT7ufwnZQvE/s3jV9guMbtR71gy7U6ZebTEXEncHxEzKDsZCZQzhpvNhH4dkScTgkmW9E6hJ9AuQLFv2e5esinKH28Lo2I7bKLa5hm5uSI+ClwWpSrWvyZ8sMN45umeznKpe8uq1pZLqe0Nry+Wu5fMvPs3q2Jtv2M0t/2/IhYh9IK/E5KS/ApVf+6ntwLHBrlsoSTgdmZOYWyUz+zYf1uS/mlru525t0tA+C4iLiBchLZnygtaocBv4mIr1E+wJal9CN8J7BHiy4vAETEryg77D9TtpOxlC9P34ASMCLiTOC8Ksj8HyVMrU85zPqtqt9lK/9F6Xf5s4j4X0ofyFMprcffXITn36PFqTfKZeVWoXzIP1U95qOUE/o6ulHcCxwWEe+hBIDnslwJ4lrg4xHxGcqJX3sA72qxmK62k99TTuj9RhWq5wMfZ+GGk2mULh0HR8QDlOD3cPXB27wu5kbEqcDZ1frvuFrIcSzcqv85ynZwU0R8m3I1hVUpH+TrZOZRrdZZw3OChbfLaymtwN+PiIsp4eZEuu6GsHVV5xW8crWQazPz99X911NakS+OiG9SztH4QlVro+9SuulcHREnUrpwHMUr4byrq8WsRdkXXErZB7xMCZPLU64MUodjKd1YromICygt4mtS3ndzM/OkHh4/Hzim2pfeSQmc76ec9Nbxmta5z2l0MvDziLiMclWd11OumvIMXazTFtrdJ7T7uQTtrZP/obyWN1f7h79QTvLclHIi+Hu6qfkMyknj/y8ivkj5cncC5epAXXb5avdzODOfiYhzgP+MiBd55WohHV/OG9ft4m4/6q3mMxz9e/X/0c3VQijB6nrKDvVpSn/Qd1NakbZrmG4YZec2lfKB+0tK/93Os5QpO+e5wH81LWMUpWX3Gz3UuRLlA+9Zyg7pCsph+2ThK3TsRDlR5FnKt/lHKR9243pYxkJXYeCVs8ub6+44S35Ew7jXUi4d9TdKP8n7gaObHtdxFvyOLZa/CuWElH9U09xfjR9OORH0qWr93litt+YrgbRztZDhwHmUQ/TzWfAqEitQLkP1F0qYnEkJSycD0c16O6Ga7u+UoHY/5QNweNN0R1Ba8GdXr+EUyq8Mrt20rpuvMLMP5YvLnGrdXA5s1DRNyytwND//Lurv6iob3dbbxbzeRQlRf6vW4dRqfa/VMM0IXnlfdV4lgVe28emU8HsV5colzWf7t9xOqvu2oHwZmEV5b3+cpquFVNPtX71Oc2nxHmrxvI6jBPc51Wu9bat1SznR8AeUI0gvVf+va2P+3W2Xn6J8CXmxWvbO1evd6uoS+/DK0bF/Ui7XuWrTst5P2cbnUI52vYfy3r+/abo3Va/Ti5Tw91XKZUTnA8u12sYoXVC+S/myMKuq4zZg/4bpW16tha6v5PM34HtN40ZV28D0ajt7ghI4d+9hPZ9GCXabU44uvVi9Rgu8x+n9PmdEd8ttquEwyqUf/1Wt/3dQWod/1DBNl/vJXuwTevxc6s06qaZdnbIPeJyyfT9N+SL90Tae9zjglobX63havDdbPK7dz+HhlC8Az1D2Wb+mNF4l8OE6th//Fu0vqpUuSZKaVK3qa2dmOyeoDjrVkZVPZ+agOVJdnWT6APC5zDxjAJY/6NZJXSLiEMoXzHGZ2U7XRPWBV92GJUnSoqi65/ydcoLYKpSrwIyndT94taHqZ/xlSqvqTEqr7Gcprc8XDFxlQ19E7EjZPm+ntEaPo7SO/9ZgPbD6JVxHxPmUw0DPZOZCJ3hUnfW/SelTOxs4PEvfO0mS+stcSr/r9Sn91u8DDsvMhX4ZVm2bS+kWdQ6li8UsSreKEzKz26vwqEezKOH6E5S+4E9Tukh9biCLEv3TLSQi/p2yEVzURbjem9JPcG9Kv75vZua2fV6YJEmSVKN+uRRfZv6O1j/20GFfSvDOzLwNeG117VxJkiRpyBgsfa7XpZy52mFaNW6hSy9FxATKZWlYccUVt95kk036pUBJkiQtue64444ZmblmT9MNlnDd/MME0MWvBmXmeZTLNzF27NicNGlSX9YlSZIkERFt/aDYYPmFxmnAeg3DI1jwJ0klSZKkQW+whOurKb8+FhGxHeXXy7r6NS5JkiRpUOqvS/H9iPLLemtExDTKT58uDZCZ5wLXUK4U8hDlUnxeU1SSJElDTr+E68x8Xw/3J/Cx/qhFkiSpv8ydO5dp06YxZ86cgS5FbVpuueUYMWIESy+99CI9frCc0ChJkvSqM23aNFZeeWXe8IY3UH4zT4NZZjJz5kymTZvGhhtuuEjzGCx9riVJkl515syZw+qrr26wHiIigtVXX32xjjQYriVJkvqQwXpoWdzXy3AtSZIk1cRwLUmS1I++9KUvMXLkSEaPHs2YMWP4wx/+AMCZZ57J7Nmze3x8u9NdddVV3HvvvW3VtNJKK7U1XYcvf/nLfTLfrjz55JPst99+tcyrrxmuJUmS+smtt97KL37xC/70pz8xefJkbrjhBtZbr/yO3kCG695qN1zXZZ111uGnP/1pvy5zURmuJUmS+slTTz3FGmuswbLLLgvAGmuswTrrrMNZZ53Fk08+ya677squu+4KwEc+8hHGjh3LyJEjOeWUUwBaTtfYOvzTn/6Uww8/nFtuuYWrr76a4447jjFjxvDwww8vUMejjz7K9ttvzzbbbMNJJ520wH1nnHEG22yzDaNHj+5cbqPjjz+eF198kTFjxnDwwQcD8K53vYutt96akSNHct555y0w/ac+9Sm22morxo8fz/Tp0wH47ne/yzbbbMMWW2zBe97zns4vCw8//DDbbbcd22yzDSeffHLnc3vsscfYfPPNAZgyZQrjxo1jzJgxjB49mgcffJDHHnuMTTbZhCOPPJLNN9+cgw8+mBtuuIEddtiBjTfemD/+8Y8AvPDCCxxxxBFss802bLnllvzsZz/r1evXlswcsn9bb711SpIkDVb33nvvAsPPP/98brHFFrnxxhvnRz7ykbzppps679tggw1y+vTpncMzZ87MzMx58+blzjvvnHfddVfL6VZcccXO25dddlkedthhmZl52GGH5WWXXdayrn322ScvvPDCzMw8++yzO+dx3XXX5Yc+9KGcP39+vvzyy/n2t789f/vb3y70+MZlNtY6e/bsHDlyZM6YMSMzM4G8+OKLMzPzC1/4Qn7sYx/LzOy8PzPzxBNPzLPOOiszM9/+9rfnpZdempmZ3/nOdzqX8+ijj+bIkSMzM/Poo4/unOe//vWvnD17dj766KM5bNiwnDx5cr788su51VZb5Qc+8IGcP39+XnXVVbnvvvtmZuYJJ5yQP/zhDzMz89lnn82NN944Z82atdDza37dqucyKdvIp7ZcS5Ik9ZOVVlqJO+64g/POO48111yTAw44gAsuuKDltD/5yU/Yaqut2HLLLZkyZUqtXTx+//vf8773ld/4O+SQQzrHX3/99Vx//fVsueWWbLXVVtx///08+OCDPc7vrLPOYosttmC77bbjiSee6HzMUkstxQEHHADA+9//fm6++WYA7rnnHnbaaSdGjRrFJZdcwpQpU4DSbWb//fcH4KCDDmq5rO23354vf/nLnH766Tz++OMsv/zyAGy44YaMGjWKpZZaipEjRzJ+/HgiglGjRvHYY491Pr/TTjuNMWPGsMsuuzBnzhymTp3a29XXLX9ERpIkqR8NGzaMXXbZhV122YVRo0Zx4YUXcvjhhy8wzaOPPspXv/pVbr/9dlZddVUOP/zwLq+93HjpuN5cn7nVJecykxNOOIEPf/jDbc/npptu4oYbbuDWW29lhRVW6Ayt3S3z8MMP56qrrmKLLbbgggsu4Kabbmp7eQcddBDbbrstv/zlL9ljjz343ve+xxvf+MbOrjZQQn3H8FJLLcW8efM6n9/ll1/Om9/85raX11u2XEuSJPWTBx54YIGW4DvvvJMNNtgAgJVXXpnnn38egH/+85+suOKKvOY1r+Hpp5/mV7/6VedjGqcDWGuttbjvvvuYP38+V155ZZfTNdphhx2YOHEiAJdccknn+D322IPzzz+fWbNmAfDXv/6VZ555ZqHHL7300sydOxeA5557jlVXXZUVVliB+++/n9tuu61zuvnz53eeiHjppZey4447AvD888+z9tprM3fu3AWWv91223H55ZcDdNbX7JFHHuGNb3wjxxxzDO985zuZPHlyy+la2WOPPfjWt75F6eUBf/7zn9t+bLsM15IkSf1k1qxZHHbYYWy22WaMHj2ae++9l89//vMATJgwgb322otdd92VLbbYgi233JKRI0dyxBFHsMMOO3TOo3E6gNNOO413vOMdvPWtb2XttdfunO7AAw/kjDPOYMstt1zohMZvfvObnHPOOWyzzTY899xzneN33313DjroILbffntGjRrFfvvt1zKgT5gwgdGjR3PwwQez5557Mm/ePEaPHs1JJ53Edttt1zndiiuuyJQpU9h666258cYbOfnkkwE49dRT2XbbbXnb297GJpts0jn9mWeeyde//nXGjRvHU089xWte85qFlv3jH/+YzTffnDFjxnD//fdz6KGHtr3+TzrpJObOncvo0aPZfPPNFzqZsw7RkdyHorFjx+akSZMGugxJkqSW7rvvPjbddNOBLmPImD17NssvvzwRwcSJE/nRj37UN1f06EGr1y0i7sjMsT091j7XkiRJGhTuuOMOjj76aDKT1772tZx//vkDXVKvGa4lSZI0KOy0007cddddA13GYrHPtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTT2iUJEl6ldr6uItqnd8dZ/R8TemI4JOf/CRf+9rXAPjqV7/KrFmzOq/n/Wpny7UkSZJqs+yyy3LFFVcwY8aMgS5lQBiuJUmSVJvhw4czYcIEvvGNbyx03+OPP8748eMZPXo048ePZ+rUqQNQYd8yXEuSJKlWH/vYx7jkkksW+Gl1gKOPPppDDz2UyZMnc/DBB3PMMccMUIV9x3AtSZKkWq2yyioceuihnHXWWQuMv/XWWznooIMAOOSQQ7j55psHorw+ZbiWJElS7Y499li+//3v88ILL3Q5TUT0Y0X9w3AtSZKk2q222mq8973v5fvf/37nuLe85S1MnDgRgEsuuYQdd9xxoMrrM16KT5Ik6VWqnUvn9aVPfepTnH322Z3DZ511FkcccQRnnHEGa665Jj/4wQ8GsLq+YbiWJElSbWbNmtV5e6211mL27Nmdw294wxu48cYbB6KsfmO3EEmSJKkmhmtJkiSpJoZrSZIkqSaGa0mSJKkmhmtJkiSpJoZrSZIkqSZeik+SJOlVauoXR9U6v/VPvrvb+zOTnXbaiRNPPJG99toLgJ/85Cecf/75XHvttbXWMlgZriVJklSLiODcc89l//33Z9ddd+Xll1/mxBNPXGKCNdgtRJIkSTXafPPN2WeffTj99NP5whe+wKGHHspGG23EhRdeyLhx4xgzZgwf/ehHmT9/PvPmzeOQQw5h1KhRbL755px11lkDXf5is+VakiRJtTrllFPYaqutWGaZZZg0aRL33HMPV155JbfccgvDhw9nwoQJTJw4kY022ogZM2Zw992lu8k//vGPAa588RmuJUmSVKsVV1yRAw44gJVWWolll12WG264gdtvv52xY8cC8OKLL7Leeuuxxx578MADD/CJT3yCvffem913332AK198hmtJkiTVbqmllmKppUoP5MzkiCOO4NRTT11ousmTJ/OrX/2Ks846i8svv5zzzjuvv0utlX2uJUmS1Kd22203fvKTnzBjxgwAZs4RawsFAAAgAElEQVScydSpU5k+fTqZyf77788XvvAF/vSnPw1wpYvPlmtJkqRXqZ4unddfRo0axSmnnMJuu+3G/PnzWXrppTn33HMZNmwYH/zgB8lMIoLTTz99oEtdbJGZA13DIhs7dmxOmjRpoMuQJElq6b777mPTTTcd6DLUS61et4i4IzPH9vRYu4VIkiRJNTFcS5IkSTUxXEuSJPWhodwFd0m0uK+X4VqSJKmPLLfccsycOdOAPURkJjNnzmS55ZZb5Hl4tRBJkqQ+MmLECKZNm8b06dMHuhS1abnllmPEiBGL/HjDtSRJUh9Zeuml2XDDDQe6DPUju4VIkiRJNTFcS5IkSTUxXEuSJEk1MVxLkiRJNTFcS5IkSTUxXEuSJEk1MVxLkiRJNfE615IkSa9CU784aqBLGDTWP/nufluW4VqSJL1qbH3cRQNdwqBx5coDXcGSyW4hkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJN+i1cR8SeEfFARDwUEce3uP81EfHziLgrIqZExAf6qzZJkiSpDv0SriNiGHAOsBewGfC+iNisabKPAfdm5hbALsDXImKZ/qhPkiRJqkN/tVyPAx7KzEcy8yVgIrBv0zQJrBwRAawE/B2Y10/1SZIkSYutv8L1usATDcPTqnGNzgY2BZ4E7gY+kZnzm2cUERMiYlJETJo+fXpf1StJkiT1Wn+F62gxLpuG9wDuBNYBxgBnR8QqCz0o87zMHJuZY9dcc836K5UkSZIWUX+F62nAeg3DIygt1I0+AFyRxUPAo8Am/VSfJEmStNj6K1zfDmwcERtWJykeCFzdNM1UYDxARKwFvBl4pJ/qkyRJkhbb8P5YSGbOi4ijgeuAYcD5mTklIo6q7j8XOBW4ICLupnQj+WxmzuiP+iRJkqQ69Eu4BsjMa4Brmsad23D7SWD3/qpHkiRJqpu/0ChJkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVpO1wHRGHRMQ1EXF7NbxDRLy770qTJEmShpYuw3VEvL/h9snAscBPgI2r0U8DJ/ZpdZIkSdIQ0l3L9UERcWR1+4PA3pl5AZDVuIeBN/ZhbZIkSdKQ0l24fgewaXV7GeAf1e2OcL0iMLuP6pIkSZKGnC7DdWbOz8xPVYPXA6dHxLCGSU4GrunL4iRJkqShpN0TGo8F3kRpvV4lIp4FRgGfbXdBEbFnRDwQEQ9FxPFdTLNLRNwZEVMi4rftzluSJEkaDIa3M1FmPgvsHRHrAxsAT2TmY+0upGrxPgd4GzANuD0irs7MexumeS3wbWDPzJwaEa9r/2lIkiRJA6+tluuIuA0gM6dm5v91BOuIuLnN5YwDHsrMRzLzJWAisG/TNAcBV2Tm1GpZz7Q5b0mSJGlQaLdbyGa9HN9sXeCJhuFp1bhGbwJWjYibIuKOiDi01YwiYkJETIqISdOnT29z8ZIkSVLf67ZbSEScV91ctuF2hw2B+9tcTrQYl03Dw4GtgfHA8sCtEXFbZv5lgQdlngecBzB27NjmeUiSJEkDpqc+1zO7uJ3AFEr3jnZMA9ZrGB4BPNlimhmZ+QLwQkT8DtgC+AuSJEnSENBtuM7ME6D0uc7Mny3Gcm4HNo6IDYG/AgdS+lg3+hlwdkQMp1xXe1vgG4uxTEmSJKlfdRmuI2LbzPxDNTg9It7SarrMvKWnhWTmvIg4GrgOGAacn5lTIuKo6v5zM/O+iLgWmAzMB76Xmff08vlIkiRJA6a7lutLgH+rbl/exTQJrNPOgjLzGpp+dCYzz20aPgM4o535SZIkSYNNl+E6M/+t4fba/VOOJEmSNHS1eyk+SZIkST3ors/1gyx8ubyFZOabaq1IkiRJGqK663N9dL9VIUmSJL0KdNfn+rr+LESSJEka6nr6EZlOEbEpsCOwBg2/uJiZX+6DuiRJkqQhp61wHREfAM4BfgvsCvwG2AX4ZZ9VJkmSJA0x7V4t5ATgHZm5F/Bi9f8A4B99VpkkSZI0xLQbrl+fmTdWt+dHRAA/B97dN2VJkiRJQ0+74fqvEbF+dfshYC9gLDCvT6qSJEmShqB2T2j8BjAamAp8CbiieuxxfVSXJEmSNOS0Fa4z87yG21dHxGrA8pk5s88qkyRJkoaYtrqFRMRtjcOZOTszZ0bEzX1TliRJkjT0tNvnerNejpckSZKWON12C4mIju4gyzbc7rAhcH+fVCVJkiQNQT31uZ7Zxe0EpgATa69IkiRJGqK6DdeZeUJEDAPuAS7PzDn9U5YkSZI09PTY5zozXwa+bbCWJEmSutfuCY2/iog9+rQSSZIkaYhr90dkXgauiojfAk9Q+lwDkJkT+qIwSZIkaahpN1xPBc7sy0IkSZKkoa7dX2g8oa8LkSRJkoa6dluuiYjtgUOAdYG/Ahdn5i19VZgkSZI01LT78+eHAr8A5gA3Ai8CP4uIw/qwNkmSJGlIabfl+nPA7pl5R8eIiLgE+BFwYV8UJkmSJA017V6Kb03grqZx91TjJUmSJNF+uL4NOC0ilgWo/n+pGi9JkiSJ9ruFHAVcBjwbEdMpLdaTgff2VWGSJEnSUNPupfieALaLiI2BtYEnM/OhPq1MkiRJGmLavhRfZSowEyAiVgPIzL/XXZQkSZI0FLV7Kb6dI+JeYDYwvfqbUf2XJEmSRPsnNP4A+DawFrBK9bdy9V+SJEkS7XcLWRn4dmbO78tiJEmSpKGs3ZbrbwHH9mUhkiRJ0lDXbsv1RcCNEXECTf2sM3Oz2quSJEmShqB2w/UVwB+BnwIv9l05kiRJ0tDVbrjeGBhrn2tJkiSpa+32uf4lsGNfFiJJkiQNde22XM8DromIXwNPN96RmRNqr0qSJEkagtoN109QrhgiSZIkqQtthevMPKGvC5EkSZKGunb7XEuSJEnqgeFakiRJqonhWpIkSaqJ4VqSJEmqSdvhOiIOiYhrIuL2aniHiHh335UmSZIkDS1theuIOBk4FvgJ5dcaoVzv+sQ+qkuSJEkactptuf4gsHdmXgBkNe5h4I19UZQkSZI0FLUbrpcB/lHd7gjXKwKza69IkiRJGqLaDdfXA6dHxLCGcScD19RfkiRJkjQ0tRuujwXeRGm9XiUingVGAZ/pq8IkSZKkoabdnz9/Ftg7IjYA1geeyMzH+rIwSZIkaahpK1x3yMzHgcf7qBZJkiRpSOsyXEfEXF45ebFLmblMrRVJkiRJQ1R3LdebN9x+G3AA8D+UlusNgOOAH/ddaZIkSdLQ0mW4zswHOm5HxM+B7TNzZjVqckTcCtwKnNO3JUqSJElDQ7tXC1mNhYP48Gq8JEmSJNo/ofES4PqI+BrwBLAe8J/VeEmSJEm0H64/CXwc+DCwDvAUcCFwdh/VJUmSJA057V7n+mXgzOpPkiRJUgvt9rmWJEmS1APDtSRJklSTXv1CoyRJWtDUL44a6BIGjfVPvnugS5AG3GK1XEfEsLoKkSRJkoa6tsJ1RPw8ItZsGrcJcFufVCVJkiQNQe22XD8C3B0R+wJExLHALcDFfVWYJEmSNNS0eym+T0TE1cAPIuIM4J/ADpl5X59WJ0mSJA0hvelzvTqwPDCnGp5ffzmSJEnS0NVun+uLga8A78rM0cCPgFsi4uN9WZwkSZI0lLTbcv0SsEVm/h4gM78G7Awc0VeFSZIkSUNNu32uFwrRmXlPRIyrvyRJkiRpaGorXEfEQd3cfWlNtUiSJElDWru/0Njct/r1wLrAJAzXkiRJEtB+t5Dtm8dFxEcpAVuSJEkSi/fz5+cCR9VViCRJkjTUtdstZAERsTRwMPB8veVIkiRJQ1e7JzTOBbJh1DBgOnBkXxQlSZIkDUXttlxv3jT8AvBkZvorjZIkSVKlrT7XmflA09+03gbriNgzIh6IiIci4vhuptsmIl6OiP16M39JkiRpoLXbLWQpSheQnYE1gOi4LzN3b+Pxw4BzgLcB04DbI+LqzLy3xXSnA9e1+wQkSZKkwaLdq4V8Ffg0MBnYAfg18Ebgj20+fhzwUGY+kpkvAROBfVtM93HgcuCZNucrSZIkDRrthuv3Antk5unAy9X/fYG3tPn4dYEnGoan0XSN7IhYF3g35RJ/XYqICRExKSImTZ8+vc3FS5IkSX2v3XC9UmY+Wt1+MSKWz8wpwNg2Hx8txmXT8JnAZzPz5e5mlJnnZebYzBy75pprtrl4SZIkqe+1e7WQ+yNi68y8A/gT8LmIeA54qs3HTwPWaxgeATzZNM1YYGJEQOnXvXdEzMvMq9pchiRJkjSg2g3Xn+SV1udPAd8FVqL9X2i8Hdg4IjYE/gocCBzUOEFmbthxOyIuAH5hsJYkSdJQ0m24joj3ZeaPMvOWjnGZeR+wY28WkpnzIuJoylVAhgHnZ+aUiDiqur/bftaSJEnSUNBTy/X/Aj+qY0GZeQ1wTdO4lqE6Mw+vY5mSJElSf+rphMZWJyJKkiRJaqGnluthEbEr3YTszLyx3pIkSZKkoamncL0s8H26DtdJ+TEZSZIkaYnXU7h+ITMNz5IkSVIb2v0RGUmSJEk98IRGSZIkqSbdhuvMXLm/CpEkSZKGOruFSJIkSTUxXEuSJEk1MVxLkiRJNTFcS5IkSTUxXEuSJEk1MVxLkiRJNTFcS5IkSTUxXEuSJEk1MVxLkiRJNTFcS5IkSTUxXEuSJEk1MVxLkiRJNTFcS5IkSTUxXEuSJEk1MVxLkiRJNTFcS5IkSTUxXEuSJEk1GT7QBUiShp6tj7tooEsYNK5ceaArkDSY2HItSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNXEcC1JkiTVxHAtSZIk1cRwLUmSJNWk38J1ROwZEQ9ExEMRcXyL+w+OiMnV3y0RsUV/1SZJkiTVoV/CdUQMA84B9gI2A94XEZs1TfYosHNmjgZOBc7rj9okSZKkuvRXy/U44KHMfCQzXwImAvs2TpCZt2Tms9XgbcCIfqpNkiRJqkV/het1gScahqdV47ryQeBXre6IiAkRMSkiJk2fPr3GEiVJkqTF01/hOlqMy5YTRuxKCdefbXV/Zp6XmWMzc+yaa65ZY4mSJEnS4hneT8uZBqzXMDwCeLJ5oogYDXwP2CszZ/ZTbZIkSVIt+qvl+nZg44jYMCKWAQ4Erm6cICLWB64ADsnMv/RTXZIkSVJt+qXlOjPnRcTRwHXAMOD8zJwSEUdV958LnAysDnw7IgDmZebY/qhPkiRJqkN/dQshM68Brmkad27D7SOBI/urHkmSJKlu/kKjJEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUk367FJ8kLaqpXxw10CUMGuuffPdAlyBJ6oYt15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk2GD3QBklrb+riLBrqEQePKlQe6AkmS2mPLtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklQTw7UkSZJUE8O1JEmSVBPDtSRJklST4QNdwEDb+riLBrqEQePKlc8Y6BIGjfVPvnugS5AkSUOQLdeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk0M15IkSVJNDNeSJElSTQzXkiRJUk36LVxHxJ4R8UBEPBQRx7e4PyLirOr+yRGxVX/VJkmSJNWhX8J1RAwDzgH2AjYD3hcRmzVNthewcfU3AfhOf9QmSZIk1aW/Wq7HAQ9l5iOZ+RIwEdi3aZp9gYuyuA14bUSs3U/1SZIkSYtteD8tZ13giYbhacC2bUyzLvBU40QRMYHSsg0wKyIeqLfUJdcGsAYwY6DrGBROiYGuQA3cNhu4bQ46bp8N3D4HFbfNBvVsmxu0M1F/hetWzygXYRoy8zzgvDqK0oIiYlJmjh3oOqRmbpsazNw+NVi5bQ6M/uoWMg1Yr2F4BPDkIkwjSZIkDVr9Fa5vBzaOiA0jYhngQODqpmmuBg6trhqyHfBcZj7VPCNJkiRpsOqXbiGZOS8ijgauA4YB52fmlIg4qrr/XOAaYG/gIWA28IH+qE0LsLuNBiu3TQ1mbp8arNw2B0BkLtStWZIkSdIi8BcaJUmSpJoYriVJkqSaGK4lDWkR4YV1NSi5bUpLJsO1pCEpIlYByMw0xGgwcduUlmyGa3UpIt4YEa8b6DqkZhGxBzAxInYDQ4wGD7dNSYZrLaS61vhmwJ3AURHxhoGtSFrIa4HXAztExF5QQszAliQBbpsaRJq/2EWEua8fuJK1kOqD4FFgMrAM8N6I2HBgq5IW8Azwd2A+sFtEbBMRr4uIlQa4LsltU4NGxxe7js/wzJw/sBUtGQzX6so84HHKD/qsA+wREeMjYtzAlqUlVVOLy/8BvwYuoHwR/AxwFbBaNa2H4dVvImK5hkG3TQ0qEfFB4AMR8aZq+MyIePcAl/WqZrhWS5k5F7iN8suZ3wB2A34K2Adb/S4idgTe1hBMlgJ2AV4CpgFvBaYD64GH4dV/ImI8cEpErFyNctvUYPMnYC7lSMr3gDcDPxvYkl7d+uXnzzX4RcROwAjgpcy8vBr9AvAWSveQ7YDfARtHxPqZOXVgKtWSpjpB7Ezg8I5gkpkvRcSFwLHA/sBxwBrAnhFxV2bOGrCCtcSIiN2B7wIfysznoXPbvAi3TQ2QiIiG7iDDMvPPEfEMcBnlnICDMnN+43Sql+FaRMTewJeBXwCbRcSKmXkR8Mdq/KeBCZTWl3dTuopIfS4itqUcXv9AZv6husTZbMq5AA8CJwGfzMyrI2IdypdDw4v6XEQMo7RQH5eZ10fE6sDKlL7W9wGfw21T/azaZw4Hfg+QmS9Xdx0LvAj8BtiuCt13DEyVr36G6yVcRIymBJSPZOatEXEiMDwi1s7MeyLiz8B3M/Oaavr7/IBQP9oYuAmYFRFvpnzZe57SPem4zNwUOltnnhywKrXEycyXI2IWsHZEjACuBv4M7AMcmJkjwW1T/ac6yvdN4OCm8dsCG2Xm+OqL3nGUz/m7M/OlASj1VS88IrBki4iNgFUzc1JErEa5/N5kYCbwcmYeUU03rOEbsNQvImI4cCilW9LbKeH618AOwOHAe4G/eWhTAyEi3gPsDDwGzMnMb0fE24ELgbdm5uSBrE9Ljoh4C+W8qIMz8zcRsVJmzoqI5TJzTkQs0xGkq4aKv2fm9AEt+lXMluslXGY+3DC4N3BCZl5SHX6/NCLemZlXG6zV36r+gPOq/qsAv8/MC6v7/gb8OzDbYK0B9AvgIGBX4PSqEeKXEfFjStclqc9Vn9djgd8CMyJiA+ArEfE8sFpEfC4zH4yI4Zk5LzMfGNCClwCGazW6tOMamJn5z4j4K2CoVr9pPMGm+mW7paqAfQEwrOEIyh7AvwFLD2C5WoI0n/xVbYv/ioiDKecF7AL8vTrs/jbgtAEpVEuUiHgX5cID36NcqeYYYC/gfyhX/NoNOCci3tNx0q36nuF6CdYRVDo+NBovLh8R/wFsDf+/vXuNsasqwzj+f0pLC1guWgS5tNgPGlHTGsUgRMELGgmoAcQqxVhFgQpVAYsFAWlAREPiJaBCubagVlBBBRqIkogg5SIogjSmtZY7bShgpbW1jx/WOnEzzpQ2OTN72vP8kiZn9l57n/c0KzPvWftda3F+exFGL+k7EafqJDN72l5a230WmA5Mtb18aKOMXjRA3+z8vhwNfBw4DphE+b35EdvLhjTI6DmSDqB8ifui7UV1QOwzwO2Np3yPUwYi1rQXae/JOtc9RNJ7JH22JiedCTkj6gjh2yTtW9tNB75KWfpscZsxR2+oE3GuBFY3jnX65n7ADZL2ris0jAM+bvsvLYUbPWQj+uYdwHjb37d9HnC07YdaCjd6y1uBObZvrpNqJwEPADc32hwATAS2bSG+npWR6x4h6YPAtyiPLw+R9E/bP6prXe4P/AD4Um3+B2BBn3rsiEFRN4i5jDISfW9nIg4wRtKLwFnAaY2E5by2Yo3espF9c6btpTXhXk8jCY8YZOv4X23/fOAflM2LRkiaARwCnESZ5LiynRB7U5LrHiBpO8oal6fWyTar6/G3215IWXlhlu1ba4nIfW3GGz3nzZQto1c0JuI8T9ku+gvAoXVjDkF2uIsh9UZKKcjyl+ubjfkq6Z8xVH4DXCfpbZQlcy+XNBGYSZkD8AbKhjF5kjLEklz3jicAJE2mbApzFzBe0iO2P1XPaeDLI7pL0puAbShL640GPk9Zbq8zEef9wOXAkZLWJmmJoSLpQ8AuwFxgDP8/SSx9M1pX96I4BfgusKgeWyxpFLDO9qxWA+xhSa63YJJeZ3uR7VWS7qesCzwBmG97Zm1zt6SjbF+dPxAxVGqZ0vmU9YH/BcymTF68x/Yltc3jlFrB1embMVRUtjSfDXzF9r8kXQocA9xle05tk74Zw8VNlPKkr0laWo9NIuVzrUpyvYWSdAgwX9INtqfY/raki4EjgOZuYb+l1GhFDAlJB1J2EZtqe6GkG4AdgAt56drAzYk46aMx6OoExbmUco+FdWOtEfVYs5Y6fTOGBdvrgKskPUj5+z4amGb7b+1G1tuSXG+Bao31CZQ66/0kXWP7E3UURsBlko4AJlPWY53TYrjRe54Cjq3Jy67APsAsype+P0j6EXA0pQ9nIk4MpRXAWsqW5q+i7Hi3BngOuFnS1ZQdQ9M3Y1ipc6UyX2qYyPbnW6i6kcHzlHrBHwBrbB9Vz51BmeiwPeXR54OtBRo9TdLplN9D50iaRtkc5nTKY/irbD/caoDRcyRNAn5OeYpyNnAppaTufZS++TnSNyNiA5Jc94A6AnMxsNb2lDqbeHvgIdt5pBnDhqSbgBPzSDPaJGlv4N22L2wcWwAcn7X/I+LlZBOZHmB7BXAs8KKkRcACYHkS62hT39VpJB0OvBpY1U5EEYXth/ok1ocDO1Mm30ZEbFBqrnuE7eWS/kRZTuog24+2HVP0ts4qC5JGA1Mpmx18zPYTrQYWUdUvgNMoy5d+1PaTLYcUEZuBJNc9QtJOwMHA+23/ue14IhrWU9ZhP8z2I20HE9HHYkrf/GvbgUTE5iE11z1E0hjb2Zo3IiIiYpAkuY6IiIiI6JJMaIyIiIiI6JIk1xERERERXZLkOiIiIiKiS5JcR0RERER0SZLriIjYIElXSDpnA+c/KGm1pDcMZVwREcNRkuuIiEEgaYqkuyStkvR0fT29786UmxtJB0q6rfHzKOBM4MPAN9uKKyJiuEhyHRHRZZJOBr4DfAvYFdgFOA7YH9h6gGu2GrIAu2sv4CzbC4C5kl7VcjwREa1Kcrkz4ZIAAAQNSURBVB0R0UWSdgBmA9NtX2v7BRd/tH2U7TW13RWSvi/pRkmrgHdL2kHSVZKekbRU0lcljajtvyZpXuN99pJkSSPrz7dJOk/SQknPSbpe0isb7feVdIeklZIekHTgBj7DWyTdJ+kFST8BxmzgI58AXCrpeeBUYO/GfbaRdKWkZyU9LGmmpEcb53eTdF39vEskzdik/+yIiGEoyXVERHe9AxgNXL8RbT8BnAuMBW4HvgfsAEwEDgA+CUzbhPf+JPBpYDdgHfBdAEm7A78GzgFeCZwCXCdp5743kLQ18Atgbm37U+Dwznnbt9k+sHHJ3cDk2vYa4KeSOsn4WZSR7YnAQcDUxvuMAH4JPADsDrwX+KKkD2zC542IGHaSXEdEdNc4YLntdZ0DjRHjFyW9q9H2etu/t70eWAt8DJhVR7v/DlwAHL0J7z3X9oO2VwFnAEfWcpOpwI22b7S93vYtwD3Awf3cY19gFPBt22ttX0tJoPtle57tFbbX2b6A8sXi9fX0kcDXbT9r+1Fqsl/tA+xse7btf9teDFwCTNmEzxsRMeyMbDuAiIgtzApgnKSRnQTb9n4AtSSiOaixrPF6HKUee2nj2FLKqO7Gat5vKSVJHgdMAD4q6dDG+VHAb/u5x27AY7bd5179qvXlx9TrDGxf37Nzr2ZMzdcTgN0krWwc2wr43UDvFRGxOcjIdUREd90JrKGsnvFymgnscsro9YTGsfHAY/X1KmDbxrld+7nfnn2uXVvvu4wyqr1j4992tr/Rzz2eAHbvs6rJ+P6Cl/ROSp31kcBOtncEngM61z4B7DFAfMuAJX1iGmu7v9H0iIjNRpLriIgusr0SOBu4SNIRkl4haYSkycB2G7juP8B84FxJYyVNAE4COpMY7wfeJWl8nTQ5q5/bTJW0t6RtKZMqr633nQccKukDkraSNKYuqbdHP/e4k1KvPUPSSEmHAW8fIOyxte0zwEhJZ1JGrjvmA7Mk7VTrvk9onFsIPC/p1DrxcStJb5K0z0D/RxERm4Mk1xERXWb7m5TEeCbwNPAU8EPKKO8dG7j0RMoI9WLKBMdrgMvqPW8BfgL8CbgX+FU/188FrgCepKzwMaNeu4wykn4aJRFeBnyZfv4G2P43cBjwKeBZSh34zwaIdwFwE7CIUjqympeWfswGHgWWALcC11JG9TtfJg6lTIZcQhlhn0OZ0BkRsdnSS8vqIiJic1Q3dplne07bsQxE0vHAFNsHtB1LRMRgych1REQMCkmvkbR/LYt5PXAy8PO244qIGExZLSQiIgbL1pRymNcCK4EfAxe1GlFExCBLWUhERERERJekLCQiIiIiokuSXEdEREREdEmS64iIiIiILklyHRERERHRJUmuIyIiIiK65L+YDO5g8yIWhQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualisation des taux de mortalité selon le statut de tabagisme et par l'age\n",
"table_smoking_reset = table_smoking.reset_index()\n",
"plt.figure(figsize=(12, 8))\n",
"sns.barplot(data=table_smoking_reset, x='GroupeAge', y='Taux de mortalité', hue='Smoker')\n",
"plt.title('Taux de mortalité selon le statut de tabagisme par groupe d\\'âge', fontsize=16)\n",
"plt.ylabel('Taux de mortalité', fontsize=12)\n",
"plt.xlabel('Groupe d\\'âge', fontsize=12)\n",
"plt.legend(title='Statut de tabagisme')\n",
"plt.xticks(rotation=45)\n",
"plt.ylim(0, 1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interprétation : \n",
"Les taux de mortalité augmentent généralement avec l'âge, mais il est intéressant de noter que les fumeuses ont un taux de mortalité plus élevé à chaque classe d'âge, ce qui est un indicateur de l'impact du tabagisme sur la santé. Ce phénomène peut être expliqué par les effets à long terme du tabagisme sur des maladies telles que le cancer, les maladies cardiovasculaires, et les maladies pulmonaires. Ce qui est étrange ici c'est que le taux de mortalité est similaire pour les femmes de +65 ans.\n",
"\n",
"Ce paradoxe peut être expliqué simplement : Les fumeuses qui atteignent 85 ans sont donc une population sélectionnée, ayant survécu aux effets du tabac, tandis que les non-fumeuses à cet âge ont généralement une meilleure espérance de vie, malgré un nombre absolu de décès plus élevé. Ainsi, le taux de mortalité reste similaire en raison de la taille relative des groupes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Étape 4 : Régression Logistique"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Erreur : Un ou plusieurs groupes sont vides.\n"
]
}
],
"source": [
"# Vérification si les groupes 'fumeuses' et 'non-fumeuses' ont des données\n",
"smokers_data = data[data['Smoker'] == 'yes']\n",
"non_smokers_data = data[data['Smoker'] == 'no']\n",
"\n",
"# Si les groupes sont vides, afficher un message d'erreur et arrêter le calcul\n",
"if smokers_data.empty or non_smokers_data.empty:\n",
" print(\"Erreur : Un ou plusieurs groupes sont vides.\")\n",
"else:\n",
" # Création de la variable 'Death' pour indiquer si l'individu est décédé durant la période de 20 ans\n",
" data['Death'] = data['Status'].apply(lambda x: 0 if x == 'alive' else 1)\n",
"\n",
" # Modèle de régression logistique pour les fumeuses\n",
" model_smokers = smf.logit('Death ~ Age', data=smokers_data).fit()\n",
" print(model_smokers.summary())\n",
"\n",
" # Modèle de régression logistique pour les non-fumeuses\n",
" model_non_smokers = smf.logit('Death ~ Age', data=non_smokers_data).fit()\n",
" print(model_non_smokers.summary())\n",
"\n",
" # Tracer les courbes de probabilité de décès en fonction de l'âge pour chaque groupe\n",
" age_range = np.linspace(data['Age'].min(), data['Age'].max(), 100)\n",
" death_prob_smokers = model_smokers.predict(pd.DataFrame({'Age': age_range}))\n",
" death_prob_non_smokers = model_non_smokers.predict(pd.DataFrame({'Age': age_range}))\n",
"\n",
" # Vérification de la taille des groupes avant de calculer les intervalles de confiance\n",
" if len(smokers_data) > 0:\n",
" ci_smokers = 1.96 * np.sqrt(death_prob_smokers * (1 - death_prob_smokers) / len(smokers_data))\n",
" else:\n",
" ci_smokers = np.zeros_like(death_prob_smokers)\n",
" \n",
" if len(non_smokers_data) > 0:\n",
" ci_non_smokers = 1.96 * np.sqrt(death_prob_non_smokers * (1 - death_prob_non_smokers) / len(non_smokers_data))\n",
" else:\n",
" ci_non_smokers = np.zeros_like(death_prob_non_smokers)\n",
"\n",
" # Tracer les courbes avec les intervalles de confiance\n",
" plt.plot(age_range, death_prob_smokers, label='Fumeuses', color='red')\n",
" plt.plot(age_range, death_prob_non_smokers, label='Non-Fumeuses', color='blue')\n",
"\n",
" # Tracer les intervalles de confiance\n",
" plt.fill_between(age_range,\n",
" death_prob_smokers - ci_smokers,\n",
" death_prob_smokers + ci_smokers,\n",
" color='red', alpha=0.3)\n",
" plt.fill_between(age_range,\n",
" death_prob_non_smokers - ci_non_smokers,\n",
" death_prob_non_smokers + ci_non_smokers,\n",
" color='blue', alpha=0.3)\n",
"\n",
" plt.xlabel('Âge')\n",
" plt.ylabel('Probabilité de décès')\n",
" plt.legend()\n",
" plt.title('Probabilité de décès en fonction de l\\'âge et du statut de tabagisme')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"Le Paradoxe de Simpson apparaît ici car les taux de mortalité semblent diverger en fonction du tabagisme dans les groupes d'âge, suggérant une conclusion différente lorsque l'on analyse toutes les femmes en tant que groupe unique comparé à une analyse par tranche d'âge."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment