{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Etude autour du paradoxe de Simpson¶" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "En 1972-1974, à Whickham, une ville du nord-est de l'Angleterre, située à environ 6,5 kilomètres au sud ouest de Newcastle upon Tyne, un sondage d'un sixième des électeurs a été effectué afin d'éclairer des travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Une suite de cette étude a été menée vingt ans plus tard (Vanderpump et al. 1995). Certains des résultats avaient trait au tabagisme et à savoir si les individus étaient toujours en vie lors de la seconde étude. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé mais ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage.\n", "\n", "L'ensemble de ces données est disponible dans le fichier 'SmokingNotSmokingWomen_InputData.csv'. \n", "Pour info, sur chaque ligne du fichier 'SmokingNotSmokingWomen_InputData.csv', on trouve (Smoker,Status,Age) :\n", "si la personne fume ou non, si elle est vivante ou décédée au moment de la seconde étude, et quel était son âge lors du premier sondage." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## I - Chargement des données et controle si leur contenu est valide + extraction de quelques informations générales :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remarque préliminaire : \n", "* Pour se protéger contre une éventuelle disparition ou modification du serveur ou des fichiers qu'il abrite, \n", "faire une copie locale de ce jeux de données qui sera préservée avec l'analyse qui va en etre faite. \n", "* On télécharge les données depuis le WEB seulement si la copie locale n'existe pas." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Sur le WEB : \n", "# * lien permanent du fichier de données :\n", "#data_url=\"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/blob/e06310de87a61ee756949895d588e65334b0bfc9/module3/Practical_session/Subject6_smoking.csv\"\n", "# * lien actualisé du fichier de données :\n", "data_url=\"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/raw/master/module3/Practical_session/Subject6_smoking.csv\"\n", "\n", "# Chemin vers une copie locale de ce fichier : \n", "LocalInputData=\"\" # \"C:/Users/hpascalj/__DataSets/module3_Practical_session_Subject6_smoking.csv\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the specified local data-file is not available; by default, select the data-file available at the prescribed url !\n" ] } ], "source": [ "# Chargement du jeu de données originales :\n", "# -----------------------------------------\n", "import os\n", "import urllib.request\n", "if not os.path.exists(LocalInputData):\n", " print (\"the specified local data-file is not available; by default, select the data-file available at the prescribed url !\")\n", " urllib.request.urlretrieve(data_url, LocalInputData)\n", " OriginalInputData=pd.read_csv(data_url) # ainsi possible avec une version de pandas >= 0.19.2\n", "else :\n", " print (\"usage of the LocalInputData = \", LocalInputData)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "le nb d'enregistrements = (1314, 3) = (nb_lignes='features-values' , nb_colonnes='features')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
2NoDead57.5
3NoAlive47.1
4YesAlive81.4
5NoAlive36.8
6NoAlive23.8
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "2 No Dead 57.5\n", "3 No Alive 47.1\n", "4 Yes Alive 81.4\n", "5 No Alive 36.8\n", "6 No Alive 23.8\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print (\"le nb d'enregistrements = \",OriginalInputData.shape, \" = (nb_lignes='features-values' , nb_colonnes='features')\")\n", "#\n", "# InputData_index = OriginalInputData.index\n", "# InputData_columns = OriginalInputData.columns\n", "# print (\"InputData : indexe des colonnes = \",InputData_columns)\n", "# print (\"InputData : indexe des lignes = \",InputData_index)\n", "#\n", "OriginalInputData.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Verification qu'il n'y a pas de points manquants dans ce jeux de données :" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Smoker, Status, Age]\n", "Index: []" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OriginalInputData[OriginalInputData.isnull().any(axis=1)]" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Toutes les entregistrements sont dans un état valide. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "## Remarque \n", "## Les donnees peuvent etre triées par valeurs croissantes de l'age , pour les lister aisement par tranches d'age\n", "## Mais ce n'est pas absolument necessaire ! Donc on n'active pas cette option pour la suite :\n", "#sortedData = OriginalInputData.set_index('Age').sort_index()\n", "#sortedData.head(25) # shows the first 25 lines of records" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Affichons quelques informations générales quant à la distribution des ages :" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Age min = 18\n", " Age max = 89\n", " Age moyen = 47\n", " Age median = 44\n", " Age ecartType = 19.15\n" ] } ], "source": [ "Age = OriginalInputData['Age']\n", "# Soit utiliser la fonction suivante objet.describe() \n", "# Age.describe()\n", "# Ou bien :\n", "AgeMax= np.max(Age); AgeMin= np.min(Age); AgeMoy= np.mean(Age) ; AgeMedian = np.median(Age) ; ecartType=np.std(Age)\n", "print (\" Age min = % 3d\" % AgeMin)\n", "print (\" Age max = % 3d\" % AgeMax)\n", "print (\" Age moyen = % 3d\" % AgeMoy)\n", "print (\" Age median = % 3d\" % AgeMedian)\n", "print (\" Age ecartType = % 4.2f\" %ecartType)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dénombrons les nombres de femmes fumeuses et non fumeuses, de femmes vivantes et de mortes :" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 1314\n", "unique 2\n", "top No\n", "freq 732\n", "Name: Smoker, dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Vérifions le contenu de la sous-liste de données qui contient 2 données binaires 'Yes' ou 'No' :\n", "Fumeuse_ou_non = OriginalInputData['Smoker']\n", "Fumeuse_ou_non.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit que, sur 1314 femmes, il y a 732 specimens du type 'No' ; \n", "donc on peut en déduire qu'il y a 1314-732 = 582 fumeuses" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 1314\n", "unique 2\n", "top Alive\n", "freq 945\n", "Name: Status, dtype: object" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Vérifions le contenu de la sous-liste de données qui contient 2 données binaires 'Dead' ou 'Alive' :\n", "Vivante_ou_Morte = OriginalInputData['Status']\n", "Vivante_ou_Morte.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit que, sur 1314 femmes, il y a 945 specimens du type 'Alive' ; \n", "donc on peut en déduire qu'il y a 1314-945=369 femmes decedees, qu'elles aient été fumeuses ou non" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## II - Calculons dans chaque groupe (fumeuses, non fumeuses) le taux de mortalité (le rapport entre le nombre de femmes décédées dans un groupe avec le nombre total de femmes dans ce groupe)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
4YesAlive81.4
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
10YesAlive30.0
12YesAlive49.2
19YesAlive65.7
21YesAlive38.3
\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "4 Yes Alive 81.4\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5\n", "10 Yes Alive 30.0\n", "12 Yes Alive 49.2\n", "19 Yes Alive 65.7\n", "21 Yes Alive 38.3" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# selection des lignes du DataFrame Fumeuse_ou_non en fonction de la valeur 'Yes' dans la colonne descriptive 'Smoker':\n", "Fumeuses = OriginalInputData.loc[OriginalInputData['Smoker'] == 'Yes']\n", "Fumeuses.head(10)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Fumeuses Mortes= 139 ; Nb Fumeuses Vivantes= 443\n" ] } ], "source": [ "Nb_Fumeuses = Fumeuses.shape[0] # donne le nombre de lignes dans Fumeuses\n", "# print (\"Nb_Fumeuses=\",Nb_Fumeuses)\n", "FumeusesVivantes = OriginalInputData.loc[(OriginalInputData['Smoker'] == 'Yes') & (OriginalInputData['Status'] == 'Alive')]\n", "Nb_FumeusesVivantes = FumeusesVivantes.shape[0] # donne le nombre de lignes dans FumeusesVivantes\n", "# count_col = Fumeuses_Vivantes.shape[1] # donne le nombre de colonnes dans FumeusesVivantes\n", "Nb_FumeusesMortes = Nb_Fumeuses - Nb_FumeusesVivantes\n", "print (\"Nb Fumeuses Mortes=\",Nb_FumeusesMortes,\" ; Nb Fumeuses Vivantes=\",Nb_FumeusesVivantes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit que, parmi les 582 femmes fumeuses, 443 sont toujours vivantes à l'issue des 20 ans, 139 sont mortes." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Toutes FumeusesConfonfues : TxMortalite = 23.88 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes fumeuses toutes ensemble considérées vaut :\n", "ToutesFumeusesConfonfues_TxMortalite = (Nb_FumeusesMortes)/Nb_Fumeuses * 100\n", "print (\"Toutes FumeusesConfonfues : TxMortalite = % 4.2f \" % ToutesFumeusesConfonfues_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb NonFumeuses Mortes= 230 ; Nb NonFumeuses Vivantes= 502\n" ] } ], "source": [ "NonFumeuses = OriginalInputData.loc[OriginalInputData['Smoker'] == 'No']\n", "Nb_NonFumeuses = NonFumeuses.shape[0] # donne le nombre de lignes dans NonFumeuses\n", "# print (\"Nb_NonFumeuses=\",Nb_NonFumeuses)\n", "NonFumeusesVivantes = OriginalInputData.loc[(OriginalInputData['Smoker'] == 'No') & (OriginalInputData['Status'] == 'Alive')]\n", "Nb_NonFumeusesVivantes = NonFumeusesVivantes.shape[0] # donne le nombre de lignes dans NonFumeusesVivantes\n", "Nb_NonFumeusesMortes = Nb_NonFumeuses - Nb_NonFumeusesVivantes\n", "print (\"Nb NonFumeuses Mortes=\",Nb_NonFumeusesMortes,\" ; Nb NonFumeuses Vivantes=\",Nb_NonFumeusesVivantes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit qu'il y a 732 femmes non-fumeuses (on peut vérifier que 732 + 582 = 1314 au total)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Toutes Non-Fumeuses Confonfues : TxMortalite = 31.42 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes Non fumeuses toutes ensemble considérées vaut :\n", "ToutesNonFumeusesConfonfues_TxMortalite = (Nb_NonFumeusesMortes)/Nb_NonFumeuses * 100\n", "print (\"Toutes Non-Fumeuses Confonfues : TxMortalite = % 4.2f \" % ToutesNonFumeusesConfonfues_TxMortalite)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Résumons les résultats de cette partie II sous forme d'un tableau et d'un graphe :" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Groupes SANS distinction d'age Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 443 139 23.883162\n", "1 Femmes non fumeuses 502 230 31.420765\n" ] } ], "source": [ "table0 = {\"Groupes SANS distinction d'age\": ['Femmes fumeuses', 'Femmes non fumeuses'],\n", " 'Nb_Vivantes': [Nb_FumeusesVivantes, Nb_NonFumeusesVivantes],\n", " 'Nb_Mortes': [Nb_FumeusesMortes, Nb_NonFumeusesMortes],\n", " 'Mortalité en %': [ToutesFumeusesConfonfues_TxMortalite, ToutesNonFumeusesConfonfues_TxMortalite]\n", " }\n", "Resume0 = pd.DataFrame(table0, columns=[\"Groupes SANS distinction d'age\", 'Nb_Vivantes','Nb_Mortes','Mortalité en %'])\n", "print (Resume0)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEICAYAAABPgw/pAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAE/FJREFUeJzt3X+UXGV9x/H3lyRiCihysmCK1lVKK4gScI+tB6tRWoq2p0BVlAFNj2KslbZIf1lqC9ieU6qo1fqjDUgJLaPQKhWQKkiLVEV0g5QfohY1ChKSRWyBSpGEb/+4z5phM5uZ3ZnNZp+8X+fs2Xuf+9x7v3f37mefeWZmNzITSdLCt9t8FyBJGg4DXZIqYaBLUiUMdEmqhIEuSZUw0CWpEga6qhERz4yIzfNdxzBExNERcUfH+jcj4vl97LcoIv41Il47txVqZ2Sga8Yi4sGOj0cj4qGO9RPnu76dUUScHRHnzXb/zDwgM6/v41jvBP4lMy+c7bm0cC2e7wK08GTmnpPLEbEeODkzPzN/Fe3cImKH/Zxl5mk76lza+ThC19BFxBERcUNE/E9E3B0R75kMtW7TIhHxxYg4qSz/fURc1LHtvRHxyWnOs7hs/36ZnvilKdv3iYgLI+KeiLgzIs6IiK73fBn1XhQRF5dHGjdFxNPLPvdGxPqIeHFH/5+KiCsj4r6I+EZErJpyrHY51gPAq4HTgFXl2F8q/d4YEV+LiAci4o6IeN12vqb3RMQLIuLYaY7V97WqXo7QNRceAU4BbgRGgU8DXwf+to99fwe4OSJeDUwAJwDPmabvKcBLgGeXc146ZftFwB3AM4AnAlcC64G10xzvOOBXgBPLvv8GfAB4MvAm4IPAQaXvPwFfKPs8G7gqIu7IzM+X7S8v204AdgcOAZZl5skd59sAvLTUdCRweUTckJm3TVMfmfkvEfHuLsea6bWqQv4G19Bl5pcy88uZuSUzvwmcB7yoz30fAF4LvJ8mjH4zM++ZpvvxwLsy8+7MnADeMbkhIp4GvBA4LTN/mJkbgPfRjJanc01m/ntmbgb+GXhCOf5m4KPAMyNiaUQcCBwKnJ6ZD2fmeKn1NR3H+mxmXpmZj2bmQ9Nc62WZ+e1sfAb4LPCC7X+FtjXLa1WFHKFr6CLiYOBdwOHAUpr77PPb3emxPkczel3KtqPuTj8J3Nmx/p2O5acBjwcmImKybTeaUex0NnYsPwRM5Na/XjcZynuU805MCerv0IyyJ3XW1VVE/BrwNuCnS20/AfxHr/26mM21qkKO0DUXzqWZbjkgM58AvB2YTJr/BRZFxO4d/Z88Zf/TaKZQ7gdO3c55NgBP7Vj/qY7lO4EHgSdl5t7l4wmZefiMr2ZbdwMjEbF0yrm/17E+9c+YPmY9Ivagmbb5c2DfzNybZoon6G3qsefyWrWAGOiaC3sB/5OZD0bEs4A3dGy7m2Zu/MTymunfAvaf3BgRh9CMWk8qH39WRvzdXAK8JSKWR8Qy4A8nN2Tmt4EvAu+IiL0iYreIODAiZjyl0cUdwM3AX0TE7hFxOLCKZh57OhuBp8fWIfRSYAmwCXi0jNZX9nn+xxxrjq9VC4iBrrnwFuDkiHiQ5knFiyc3ZOYW4GTgDOBemhH2OoCIeBzwj8BZmfnVzPwqzej+HyJiSZfzvJ9miuI24AaagO90ArA38DXgvlLHfoNeXJmGOR44GLinHPcPMnN70yUfpZlSuS8ivpCZ9wK/D1wOfB84luaJzH485lilbU6uVQtL+A8uJKkOjtAlqRIGuiRVwkCXpEoY6JJUiR36xqJly5bl6OjojjylJC1469atuzczR3r126GBPjo6yvj4+I48pSQteBHxnd69nHKRpGoY6JJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RK+D9FpSGJs/r573HaVeUZc/+/JxyhS1IlDHRJqoSBLkmVMNAlqRIGuiRVoverXNrxeOA6YPfS/59p5Rm0Yx/gYmAUWA8cTyt/MGeVSpK2q58R+sPAS2jlocAK4Gja8fPAW4FraOWBwDVlXZI0T3qP0FuZwINlbUn5SOAYYGVpXwtcC/zRsAuUJPWnvzn0diyiHTcBm4CraeUNwH60cgNA+bxvt10jYnVEjEfE+MTExHCqliRto79Ab+UWWrkCeArwPNpxSL8nyMw1mTmWmWMjIz3/x6kkaZZm9iqXVv43zdTK0cBG2rEcoHzeNOTaJEkz0DvQ2zFCO/Yuy0uBXwS+BlwGrCq9VgGfmJsSJUn96OePcy0H1tKORTS/AC6hlVfQjuuBS2jH64HvAq+cwzolST308yqXm4HDurR/Hzhy+CVJkmbDd4pKUiUMdEmqhIEuSZUw0CWpEga6JFXCQJekShjoklQJA12SKmGgS1IlDHRJqoSBLkmVMNAlqRIGuiRVwkCXpEoY6JJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RKGOiSVAkDXZIqsbhnj3Y8FbgQeDLwKLCGVr6XdpwJvAGYKD1Pp5VXzlGdkqQeegc6bAZ+j1beSDv2AtbRjqvLtvfQynPmrjxJUr96B3orNwAbyvIDtON2YP+5LUuSNFMzm0NvxyhwGHBDaTmFdtxMO86nHU/qtktErI6I8YgYn5iY6NZFkjQE/Qd6O/YEPgacSivvBz4EHACsoBnBv6vbbpm5JjPHMnNsZGRk8IolSV31M4cO7VhCE+YX0cqPA9DKjR3bzwWuGH55kqR+9R6htyOADwO308p3d7Qv7+h1HHDrsIuTJPWvnxH6EcBrgFtox02l7XTgBNqxAkhgPfDGOalQktSXfl7l8jkgumzxNeeStBPxnaKSVIn+nhTdGUS3BwkSkDnfFUg7BUfoklQJA12SKmGgS1IlDHRJqoSBLkmVMNAlqRIGuiRVwkCXpEoY6JJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RKGOiSVAkDXZIqYaBLUiUMdEmqhIEuSZUw0CWpEot79mjHU4ELgScDjwJraOV7acc+wMXAKLAeOJ5W/mDOKpUkbVc/I/TNwO/RyoOAnwfeTDsOBt4KXEMrDwSuKeuSpHnSO9BbuYFW3liWHwBuB/YHjgHWll5rgWPnpkRJUj9mNofejlHgMOAGYD9auQGgfN632y4RsToixiNifGJiYpBaJUnb0X+gt2NP4GPAqbTy/n53y8w1mTmWmWMjIyOzKFGS1I/+Ar0dS2jC/CJa+fHSupF2LC/blwOb5qJASVJ/egd6OwL4MHA7rXx3x5bLgFVleRXwiaFXJ0nqW++XLcIRwGuAW2jHTaXtdOBs4BLa8Xrgu8Ar56ZESVI/egd6Kz8HxDRbjxxqNZKkWfOdopJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RKGOiSVAkDXZIqYaBLUiUMdEmqhIEuSZUw0CWpEga6JFXCQJekShjoklQJA12SKmGgS1IlDHRJqoSBLkmVMNAlqRIGuiRVwkCXpEos7tmjHecDvwpsopWHlLYzgTcAE6XX6bTyyrkpUZLUj96BDhcA7wcunNL+Hlp5ztArkiTNSu8pl1ZeB9w396VIkgYxyBz6KbTjZtpxPu140nSdImJ1RIxHxPjExMR03SRJA5ptoH8IOABYAWwA3jVdx8xck5ljmTk2MjIyy9NJknrpZw59W63c+OPldpwLXDGkeiRJszS7EXo7lnesHQfcOpRqJEmz1s/LFj8CrASW0Y67gDOAlbRjBZDAeuCNc1eiJKkfvQO9lSd0af3w8EuRJA3Cd4pKUiUMdEmqhIEuSZUw0CWpEga6JFXCQJekShjoklQJA12SKmGgS1IlDHRJqoSBLkmVMNAlqRIGuiRVwkCXpEoY6JJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RKGOiSVAkDXZIqsbhnj3acD/wqsIlWHlLa9gEuBkaB9cDxtPIHc1WkJKm3fkboFwBHT2l7K3ANrTwQuKasS5LmUe9Ab+V1wH1TWo8B1pbltcCxwy1LkjRTs51D349WbgAon/edrmNErI6I8YgYn5iYmOXpJEm9zPmTopm5JjPHMnNsZGRkrk8nSbus2Qb6RtqxHKB83jS0iiRJszLbQL8MWFWWVwGfGE45kqTZ6udlix8BVgLLaMddwBnA2cAltOP1wHeBV85hjZKkPvQO9FaeMM2WI4dbiiRpEL5TVJIqYaBLUiUMdEmqhIEuSZUw0CWpEga6JFXCQJekShjoklQJA12SKmGgS1IlDHRJqoSBLkmVMNAlqRIGuiRVwkCXpEoY6JJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RKGOiSVInFA+3djvXAA8AWYDOtHBtCTZKkWRgs0BsvppX3DuE4kqQBOOUiSZUYNNATuIp2rKMdq7t1iIjVETEeEeMTExMDnk6SNJ1BA/0IWnk48FLgzbTjhVM7ZOaazBzLzLGRkZEBTydJms5ggd7Ku8vnTcClwPMGL0mSNBuzD/R27EE79vrxMhwF3DqcsiRJMzXIq1z2Ay6lHZPHadPKTw2lKknSjM0+0Fv5LeDQ4ZUiSRqEL1uUpEoY6JJUCQNdkiphoEtSJQx0SaqEgS5JlTDQJakSBrokVcJAl6RKGOiSVAkDXZIqYaBLUiUMdEmqhIEuSZUw0CWpEga6JFXCQJekShjoklQJA12SKmGgS1IlDHRJqoSBLkmVMNAlqRIGuiRVYvFAe7fjaOC9wCLgPFp59jCKkiTN3OxH6O1YBHwAeClwMHAC7Th4SHVJkmZokCmX5wF30Mpv0cofAR8FjhlOWZKkmRpkymV/4M6O9buAn5vaKSJWA6vL6oMR8fUBzqmtlgH3zncRO4WI+a5A3XmPdogzB7pPn9ZPp0ECvVt1uU1D5hpgzQDnURcRMZ6ZY/NdhzQd79Edb5Apl7uAp3asPwW4e7ByJEmzNcgI/cvAgbTj6cD3gFcDraFUJUmasdmP0Fu5GTgF+DRwO3AJrbxtSHWpN6extLPzHt3BInObaW9J0gLkO0UlqRIGuiRVYpcP9IAtATd1fIzOd029BOwe8JlS76vmux7N3EK87+ZKwEjADQFfCfiF+a5nIRvsb7nU4aGEFfNdxAwdBixZgHVrq4V4382VI4GvJaya70IWul1+hN5NwKKAdwZ8OeDmgDeW9pUBnw24JOAbAWcHnBjwpYBbAg4o/S4I+FDAvwd8K+BFAecH3B5wQcd5jgq4PuDGgH8K2LO0nx3w1XLuc6bUti/wj8CKMrI7IGB9NO/KI2As4NqyfGbA2oCrSp9fD3hHqfVTAUtKv+eW61oX8OmA5aX92oCxsrwsYH1Zfla55ptKjQeW9pM62v+ufB0Xla/HreW8b5mzb9wCtzPfd2X7meV415bj/07HttPK9/jWgFNL22g597kBt5X7cOmUY64A3gG8rNw3SwMe7Nj+isnah3B90/2cvCi2PlL6SsBepf0POr4XZ5W2PQI+GfCf5Vp3rkfImblLf5C5hcybyselpW01mW8ry7uTOU7m08lcSeZ/k7m8tH+PzLNKv98l86/L8gVkfpTMIPMYMu8n89lk7kbmOjJXkLmMzOvI3KPs80dk/hmZ+5D5dTKjtO/dpeaVZF7Rsb6ezGVleYzMa8vymWR+jswlZB5K5g/JfGnZdimZx5ZtXyBzpLS/iszzy/K1ZI6V5WVkri/Lf0PmiWX5cWQuJfMgMi8nc0lp/yCZryXzuWRe3VHrNtezK36wMO+7M8u9sns5zvdp7p/nknkLmXuQuSeZt5F5GJmjZG4mc0XZ/xIyT+py3N8g8/0d6w92LL+CzAsGvb4ePyeXk3lEWd6TzMVkHkXmmnKu3ci8gswXkvlyMs/tqO+J830vdX445dL9oe9RwHMCXlHWn0gzCv0R8OWEDQAB3wSuKn1uAV7ccYzLEzKa9o3ZfCbgNpr50qfQ/JXKz0fT/3HA9cD9wP8B5wV8ErhiwOv714RHSh2LgE911DsK/CxwCHB1qWMR5fq243rgT6K5ho8n/Fc0D5ufSzOigWYktgm4HHhGwN/QXM9VXY+461mo990nEx4GHo7m+7sf8ALg0oT/Lef6OM1c+GXAtxNuKvuuY/DnCmZ7fdvzeeDdARfR3M93RfO9OAr4SumzJ8334j+AcwL+Crgim/WdhoHeXQC/nc2bpjobV9LczJMe7Vh/lMd+PR/u0qez3xbg6oQTupz8eTQB+WqaN2+9pEe9m9k6ffb4KdseBkh4NOCR3Pr3dibrCOC2hOf3e9yEdsANwK/QTNGcXI6zNuGPu1zPocAvA28Gjgde1+N6dlUL4b7rPOYWtt5D05naf+l0HTt0vjmm6/3MLK6P6e/ns8svsZcBXwz4RZpr+suEv5t6kGgGLi8D/jLgqoS393FNO4Rz6N19GnhTbJ1j/pmAPYZ8ji8CRwT8dDnHT5Tz7Ak8MeFKmrnIfp44W09zkwG8fIZ1fJ3mVQbPL3UsCXhWl+NOjhoJeAbwrYT30YzCngNcQzPfuW/ps0/A08qc5W4JHwP+FDh8hvXtShbafTfpOuDYcqw9gOMYbOS6MeCgaPLpuBnu2/X6yrb1dPk5CTgg4ZZsRt3jwDNpvhev65h/3z9g34CfBH6YzfNY57CT3c+O0Ls7j+bh243R/KaeAI4d5gkSJgJ+A/hIwO6l+W3AA8AnohlBBP09iXgW8OGA02lGzjOp40flIf77onmIvxj4a5qHsOfQPBH3GuDfOnZ7Fc0ToI8A9wBvT7gvmvqvKj+Ij9CMyB8C/j62Dh62GcHrxxbafTd5zBujeVLyS6XpvGyeXBydZZlvpZnyuRO4lRKqfdYy3fV9g+l/Tk6NZtpqC/BVmmnKhwMOonlyFZonak+i+UXxzmgeETwCvGmW1zgnfOu/JFXCKRdJqoSBLkmVMNAlqRIGuiRVwkCXpEoY6JJUCQNdkirx/yqHxfo7WWH9AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "dark" }, "output_type": "display_data" } ], "source": [ "# Affichage sous la forme d'un graphe en barres dont la hauteur reflete des taux pour chacun des groupes. \n", "#\n", "# Construire la série des valeurs pour la hauteur des barres :\n", "HauteursDesBarres = [ToutesFumeusesConfonfues_TxMortalite,ToutesNonFumeusesConfonfues_TxMortalite]\n", "# Construire la série des labels pour chacune des barres :\n", "LabelsDesBarres = ('Femmes fumeuses', 'Femmes non fumeuses')\n", "y_pos = np.arange(len(LabelsDesBarres))\n", "# Creation du graphique en barres :\n", "plt.bar(y_pos, HauteursDesBarres, color=['red', 'green'])\n", "\n", "# Mise en forme des labels et marques sur les axes horizontal et vertical :\n", "plt.xticks(y_pos, LabelsDesBarres, color='cyan')\n", "plt.yticks(color='orange')\n", "plt.title('Taux de mortalité', fontdict=None, loc='center', pad=None)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ainsi, il apparait que le taux de mortalité est plus élevé chez les femmes non-fumeuses que chez les femmes fumeuses !!??\n", "C'est l'illustration du paradoxe de Simpson que l'on va maintenant analyser. \n", "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n", "\n", "A partir de maintenant, on va prendre en compte une variable qui n'a pas été explicitée jusqu'ici et qui introduit la confusion en influencant le résultat final : il s’agit de l’âge des personnes qui joue lui-aussi sur la mortalité.\n", "Pour ce faire, on va répéter les opérations précédentes mais en opérant par tranches d'age ; \n", "on en choisit 4 : [18:34] ; ]34:54] ; ]54:64] ; >64 ans ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## III - Pour chaque groupe (fumeuses , non fumeuses), le taux de mortalité est maintenant évalué par tranches d'age" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On reprend l'analyse en opérant dans la 1ere tranche d'age [AgeMin:34] ans :\n" ] } ], "source": [ "print (\"On reprend l'analyse en opérant dans la 1ere tranche d'age [AgeMin:34] ans :\")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb de femmes dans la tranche d'age [ 18 :34] = 400\n" ] } ], "source": [ "femmes_TrA1 = OriginalInputData[OriginalInputData[\"Age\"].between(int(AgeMin), 34)]\n", "Nb_femmes_TrA1 = femmes_TrA1.shape[0]\n", "print (\"Nb de femmes dans la tranche d'age [\",int(AgeMin),\":34] = \",Nb_femmes_TrA1)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age [ 18 :34] : Nb Fumeuses= 181 vs Nb NonFumeuses= 219\n" ] } ], "source": [ "Fumeuses_TrA1 = femmes_TrA1.loc[femmes_TrA1['Smoker'] == 'Yes']\n", "Nb_Fumeuses_TrA1 = Fumeuses_TrA1.shape[0]\n", "Nb_NonFumeuses_TrA1 = Nb_femmes_TrA1 - Nb_Fumeuses_TrA1\n", "print (\"Dans la tranche d'age [\",int(AgeMin),\":34] : Nb Fumeuses=\",Nb_Fumeuses_TrA1,\" vs Nb NonFumeuses=\",Nb_NonFumeuses_TrA1)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Fumeuses Mortes= 5 vs Nb Fumeuses Vivantes= 176\n" ] } ], "source": [ "FumeusesVivantes_TrA1 = femmes_TrA1.loc[(femmes_TrA1['Smoker'] == 'Yes') & (femmes_TrA1['Status'] == 'Alive')]\n", "Nb_FumeusesVivantes_TrA1 = FumeusesVivantes_TrA1.shape[0]\n", "Nb_FumeusesMortes_TrA1 = Nb_Fumeuses_TrA1 - Nb_FumeusesVivantes_TrA1\n", "print (\"Nb Fumeuses Mortes=\",Nb_FumeusesMortes_TrA1,\" vs Nb Fumeuses Vivantes=\",Nb_FumeusesVivantes_TrA1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Non Fumeuses Mortes= 6 vs Nb Non Fumeuses Vivantes= 213\n" ] } ], "source": [ "NonFumeusesVivantes_TrA1 = femmes_TrA1.loc[(femmes_TrA1['Smoker'] == 'No') & (femmes_TrA1['Status'] == 'Alive')]\n", "Nb_NonFumeusesVivantes_TrA1 = NonFumeusesVivantes_TrA1.shape[0]\n", "Nb_NonFumeusesMortes_TrA1 = Nb_NonFumeuses_TrA1 - Nb_NonFumeusesVivantes_TrA1\n", "print (\"Nb Non Fumeuses Mortes=\",Nb_NonFumeusesMortes_TrA1,\" vs Nb Non Fumeuses Vivantes=\",Nb_NonFumeusesVivantes_TrA1)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age [18:34] : le Taux de mortalite des fumeuses = 2.76 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes fumeuses dans la la tranche d'age [18:34] vaut :\n", "Fumeuses_TrA1_TxMortalite = (Nb_FumeusesMortes_TrA1)/Nb_Fumeuses_TrA1 * 100\n", "print (\"Dans la tranche d'age [18:34] : le Taux de mortalite des fumeuses = % 4.2f \" % Fumeuses_TrA1_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age [18:34] : le Taux de mortalite des Non fumeuses = 2.74 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes Non fumeuses dans la la tranche d'age [18:34] vaut :\n", "NonFumeuses_TrA1_TxMortalite = (Nb_NonFumeusesMortes_TrA1)/Nb_NonFumeuses_TrA1 * 100\n", "print (\"Dans la tranche d'age [18:34] : le Taux de mortalite des Non fumeuses = % 4.2f \" % NonFumeuses_TrA1_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dans la tranche d'age [18:34] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 176 5 2.762431\n", "1 Femmes non fumeuses 213 6 2.739726\n" ] } ], "source": [ "table1 = {\"Dans la tranche d'age [18:34]\": ['Femmes fumeuses', 'Femmes non fumeuses'],\n", " 'Nb_Vivantes': [Nb_FumeusesVivantes_TrA1, Nb_NonFumeusesVivantes_TrA1],\n", " 'Nb_Mortes': [Nb_FumeusesMortes_TrA1, Nb_NonFumeusesMortes_TrA1],\n", " 'Mortalité en %': [Fumeuses_TrA1_TxMortalite, NonFumeuses_TrA1_TxMortalite]\n", " }\n", "Resume1 = pd.DataFrame(table1, columns=[\"Dans la tranche d'age [18:34]\", 'Nb_Vivantes','Nb_Mortes','Mortalité en %'])\n", "print (Resume1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On reprend l'analyse en opérant dans la 2nde tranche d'age ]34:54] ans :\n" ] } ], "source": [ "print (\"On reprend l'analyse en opérant dans la 2nde tranche d'age ]34:54] ans :\")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb de femmes dans la tranche d'age ]34:54] = 436\n" ] } ], "source": [ "femmes_TrA2 = OriginalInputData[OriginalInputData[\"Age\"].between(float(34)+0.000001, 54)]\n", "Nb_femmes_TrA2 = femmes_TrA2.shape[0]\n", "print (\"Nb de femmes dans la tranche d'age ]34:54] = \",Nb_femmes_TrA2)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]34:54] : Nb Fumeuses= 237 vs Nb NonFumeuses= 199\n" ] } ], "source": [ "Fumeuses_TrA2 = femmes_TrA2.loc[femmes_TrA2['Smoker'] == 'Yes']\n", "Nb_Fumeuses_TrA2 = Fumeuses_TrA2.shape[0]\n", "Nb_NonFumeuses_TrA2 = Nb_femmes_TrA2 - Nb_Fumeuses_TrA2\n", "print (\"Dans la tranche d'age ]34:54] : Nb Fumeuses=\",Nb_Fumeuses_TrA2,\" vs Nb NonFumeuses=\",Nb_NonFumeuses_TrA2)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Fumeuses Mortes= 41 vs Nb Fumeuses Vivantes= 196\n" ] } ], "source": [ "FumeusesVivantes_TrA2 = femmes_TrA2.loc[(femmes_TrA2['Smoker'] == 'Yes') & (femmes_TrA2['Status'] == 'Alive')]\n", "Nb_FumeusesVivantes_TrA2 = FumeusesVivantes_TrA2.shape[0]\n", "Nb_FumeusesMortes_TrA2 = Nb_Fumeuses_TrA2 - Nb_FumeusesVivantes_TrA2\n", "print (\"Nb Fumeuses Mortes=\",Nb_FumeusesMortes_TrA2,\" vs Nb Fumeuses Vivantes=\",Nb_FumeusesVivantes_TrA2)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Non Fumeuses Mortes= 19 vs Nb Non Fumeuses Vivantes= 180\n" ] } ], "source": [ "NonFumeusesVivantes_TrA2 = femmes_TrA2.loc[(femmes_TrA2['Smoker'] == 'No') & (femmes_TrA2['Status'] == 'Alive')]\n", "Nb_NonFumeusesVivantes_TrA2 = NonFumeusesVivantes_TrA2.shape[0]\n", "Nb_NonFumeusesMortes_TrA2 = Nb_NonFumeuses_TrA2 - Nb_NonFumeusesVivantes_TrA2\n", "print (\"Nb Non Fumeuses Mortes=\",Nb_NonFumeusesMortes_TrA2,\" vs Nb Non Fumeuses Vivantes=\",Nb_NonFumeusesVivantes_TrA2)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]34:54] : le Taux de mortalite des fumeuses = 17.30 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes fumeuses dans la la tranche d'age ]34:54] vaut :\n", "Fumeuses_TrA2_TxMortalite = (Nb_FumeusesMortes_TrA2)/Nb_Fumeuses_TrA2 * 100\n", "print (\"Dans la tranche d'age ]34:54] : le Taux de mortalite des fumeuses = % 4.2f \" % Fumeuses_TrA2_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]34:54] : le Taux de mortalite des Non fumeuses = 9.55 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes Non fumeuses dans la la tranche d'age ]34:54] vaut :\n", "NonFumeuses_TrA2_TxMortalite = (Nb_NonFumeusesMortes_TrA2)/Nb_NonFumeuses_TrA2 * 100\n", "print (\"Dans la tranche d'age ]34:54] : le Taux de mortalite des Non fumeuses = % 4.2f \" % NonFumeuses_TrA2_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dans la tranche d'age ]34:54] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 196 41 17.299578\n", "1 Femmes non fumeuses 180 19 9.547739\n" ] } ], "source": [ "table2 = {\"Dans la tranche d'age ]34:54]\": ['Femmes fumeuses', 'Femmes non fumeuses'],\n", " 'Nb_Vivantes': [Nb_FumeusesVivantes_TrA2, Nb_NonFumeusesVivantes_TrA2],\n", " 'Nb_Mortes': [Nb_FumeusesMortes_TrA2, Nb_NonFumeusesMortes_TrA2],\n", " 'Mortalité en %': [Fumeuses_TrA2_TxMortalite, NonFumeuses_TrA2_TxMortalite]\n", " }\n", "Resume2 = pd.DataFrame(table2, columns=[\"Dans la tranche d'age ]34:54]\", 'Nb_Vivantes','Nb_Mortes','Mortalité en %'])\n", "print (Resume2)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On reprend l'analyse en opérant dans la 3eme tranche d'age ]54:64] ans :\n" ] } ], "source": [ "print (\"On reprend l'analyse en opérant dans la 3eme tranche d'age ]54:64] ans :\")" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb de femmes dans la tranche d'age ]54:64] = 236\n" ] } ], "source": [ "femmes_TrA3 = OriginalInputData[OriginalInputData[\"Age\"].between(float(54)+0.000001, 64)]\n", "Nb_femmes_TrA3 = femmes_TrA3.shape[0]\n", "print (\"Nb de femmes dans la tranche d'age ]54:64] = \",Nb_femmes_TrA3)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]54:64] : Nb Fumeuses= 115 vs Nb NonFumeuses= 121\n" ] } ], "source": [ "Fumeuses_TrA3 = femmes_TrA3.loc[femmes_TrA3['Smoker'] == 'Yes']\n", "Nb_Fumeuses_TrA3 = Fumeuses_TrA3.shape[0]\n", "Nb_NonFumeuses_TrA3 = Nb_femmes_TrA3 - Nb_Fumeuses_TrA3\n", "print (\"Dans la tranche d'age ]54:64] : Nb Fumeuses=\",Nb_Fumeuses_TrA3,\" vs Nb NonFumeuses=\",Nb_NonFumeuses_TrA3)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Fumeuses Mortes= 51 vs Nb Fumeuses Vivantes= 64\n" ] } ], "source": [ "FumeusesVivantes_TrA3 = femmes_TrA3.loc[(femmes_TrA3['Smoker'] == 'Yes') & (femmes_TrA3['Status'] == 'Alive')]\n", "Nb_FumeusesVivantes_TrA3 = FumeusesVivantes_TrA3.shape[0]\n", "Nb_FumeusesMortes_TrA3 = Nb_Fumeuses_TrA3 - Nb_FumeusesVivantes_TrA3\n", "print (\"Nb Fumeuses Mortes=\",Nb_FumeusesMortes_TrA3,\" vs Nb Fumeuses Vivantes=\",Nb_FumeusesVivantes_TrA3)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Non Fumeuses Mortes= 40 vs Nb Non Fumeuses Vivantes= 81\n" ] } ], "source": [ "NonFumeusesVivantes_TrA3 = femmes_TrA3.loc[(femmes_TrA3['Smoker'] == 'No') & (femmes_TrA3['Status'] == 'Alive')]\n", "Nb_NonFumeusesVivantes_TrA3 = NonFumeusesVivantes_TrA3.shape[0]\n", "Nb_NonFumeusesMortes_TrA3 = Nb_NonFumeuses_TrA3 - Nb_NonFumeusesVivantes_TrA3\n", "print (\"Nb Non Fumeuses Mortes=\",Nb_NonFumeusesMortes_TrA3,\" vs Nb Non Fumeuses Vivantes=\",Nb_NonFumeusesVivantes_TrA3)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]54:64] : le Taux de mortalite des fumeuses = 44.35 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes fumeuses dans la tranche d'age ]54:64] vaut :\n", "Fumeuses_TrA3_TxMortalite = (Nb_FumeusesMortes_TrA3)/Nb_Fumeuses_TrA3 * 100\n", "print (\"Dans la tranche d'age ]54:64] : le Taux de mortalite des fumeuses = % 4.2f \" % Fumeuses_TrA3_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]54:64] : le Taux de mortalite des Non fumeuses = 33.06 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes Non fumeuses dans la la tranche d'age ]54:64] vaut :\n", "NonFumeuses_TrA3_TxMortalite = (Nb_NonFumeusesMortes_TrA3)/Nb_NonFumeuses_TrA3 * 100\n", "print (\"Dans la tranche d'age ]54:64] : le Taux de mortalite des Non fumeuses = % 4.2f \" % NonFumeuses_TrA3_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dans la tranche d'age ]54:64] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 64 51 44.347826\n", "1 Femmes non fumeuses 81 40 33.057851\n" ] } ], "source": [ "table3 = {\"Dans la tranche d'age ]54:64]\": ['Femmes fumeuses', 'Femmes non fumeuses'],\n", " 'Nb_Vivantes': [Nb_FumeusesVivantes_TrA3, Nb_NonFumeusesVivantes_TrA3],\n", " 'Nb_Mortes': [Nb_FumeusesMortes_TrA3, Nb_NonFumeusesMortes_TrA3],\n", " 'Mortalité en %': [Fumeuses_TrA3_TxMortalite, NonFumeuses_TrA3_TxMortalite]\n", " }\n", "Resume3 = pd.DataFrame(table3, columns=[\"Dans la tranche d'age ]54:64]\", 'Nb_Vivantes','Nb_Mortes','Mortalité en %'])\n", "print (Resume3)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On reprend l'analyse en opérant dans la 4eme tranche d'age ]64:AgeMax] ans :\n" ] } ], "source": [ "print (\"On reprend l'analyse en opérant dans la 4eme tranche d'age ]64:AgeMax] ans :\")" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb de femmes dans la tranche d'age ]64: 89 ] = 242\n" ] } ], "source": [ "femmes_TrA4 = OriginalInputData[OriginalInputData[\"Age\"].between(float(64)+0.000001, int(AgeMax)+1)]\n", "Nb_femmes_TrA4 = femmes_TrA4.shape[0]\n", "print (\"Nb de femmes dans la tranche d'age ]64:\",int(AgeMax),\"] = \",Nb_femmes_TrA4)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]64 89 ] : Nb Fumeuses= 49 vs Nb NonFumeuses= 193\n" ] } ], "source": [ "Fumeuses_TrA4 = femmes_TrA4.loc[femmes_TrA4['Smoker'] == 'Yes']\n", "Nb_Fumeuses_TrA4 = Fumeuses_TrA4.shape[0]\n", "Nb_NonFumeuses_TrA4 = Nb_femmes_TrA4 - Nb_Fumeuses_TrA4\n", "print (\"Dans la tranche d'age ]64\",int(AgeMax),\"] : Nb Fumeuses=\",Nb_Fumeuses_TrA4,\" vs Nb NonFumeuses=\",Nb_NonFumeuses_TrA4)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Fumeuses Mortes= 42 vs Nb Fumeuses Vivantes= 7\n" ] } ], "source": [ "FumeusesVivantes_TrA4 = femmes_TrA4.loc[(femmes_TrA4['Smoker'] == 'Yes') & (femmes_TrA4['Status'] == 'Alive')]\n", "Nb_FumeusesVivantes_TrA4 = FumeusesVivantes_TrA4.shape[0]\n", "Nb_FumeusesMortes_TrA4 = Nb_Fumeuses_TrA4 - Nb_FumeusesVivantes_TrA4\n", "print (\"Nb Fumeuses Mortes=\",Nb_FumeusesMortes_TrA4,\" vs Nb Fumeuses Vivantes=\",Nb_FumeusesVivantes_TrA4)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nb Non Fumeuses Mortes= 165 vs Nb Non Fumeuses Vivantes= 28\n" ] } ], "source": [ "NonFumeusesVivantes_TrA4 = femmes_TrA4.loc[(femmes_TrA4['Smoker'] == 'No') & (femmes_TrA4['Status'] == 'Alive')]\n", "Nb_NonFumeusesVivantes_TrA4 = NonFumeusesVivantes_TrA4.shape[0]\n", "Nb_NonFumeusesMortes_TrA4 = Nb_NonFumeuses_TrA4 - Nb_NonFumeusesVivantes_TrA4\n", "print (\"Nb Non Fumeuses Mortes=\",Nb_NonFumeusesMortes_TrA4,\" vs Nb Non Fumeuses Vivantes=\",Nb_NonFumeusesVivantes_TrA4)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]64 89 ] : le Taux de mortalite des fumeuses = 85.71 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes fumeuses dans la tranche d'age ]64:AgeMax] vaut :\n", "Fumeuses_TrA4_TxMortalite = (Nb_FumeusesMortes_TrA4)/Nb_Fumeuses_TrA4 * 100\n", "print (\"Dans la tranche d'age ]64\",int(AgeMax),\"] : le Taux de mortalite des fumeuses = % 4.2f \" % Fumeuses_TrA4_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dans la tranche d'age ]64 89 ] : le Taux de mortalite des Non fumeuses = 85.49 \n" ] } ], "source": [ "# Le taux de mortalité dans le groupe des femmes Non fumeuses dans la tranche d'age ]64:AgeMax] vaut :\n", "NonFumeuses_TrA4_TxMortalite = (Nb_NonFumeusesMortes_TrA4)/Nb_NonFumeuses_TrA4 * 100\n", "print (\"Dans la tranche d'age ]64\",int(AgeMax),\"] : le Taux de mortalite des Non fumeuses = % 4.2f \" % NonFumeuses_TrA4_TxMortalite)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dans la tranche d'age ]64:89] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 7 42 85.714286\n", "1 Femmes non fumeuses 28 165 85.492228\n" ] } ], "source": [ "table4 = {\"Dans la tranche d'age ]64:89]\": ['Femmes fumeuses', 'Femmes non fumeuses'],\n", " 'Nb_Vivantes': [Nb_FumeusesVivantes_TrA4, Nb_NonFumeusesVivantes_TrA4],\n", " 'Nb_Mortes': [Nb_FumeusesMortes_TrA4, Nb_NonFumeusesMortes_TrA4],\n", " 'Mortalité en %': [Fumeuses_TrA4_TxMortalite, NonFumeuses_TrA4_TxMortalite]\n", " }\n", "Resume4 = pd.DataFrame(table4, columns=[\"Dans la tranche d'age ]64:89]\", 'Nb_Vivantes','Nb_Mortes','Mortalité en %'])\n", "print (Resume4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Resumons la partie III à l'aide des tableaux 1 à 4, et à l'aide d'un graphe en barres dont la hauteur reflete des taux de mortalité pour chacun des groupes" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dans la tranche d'age [18:34] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 176 5 2.762431\n", "1 Femmes non fumeuses 213 6 2.739726\n", " Dans la tranche d'age ]34:54] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 196 41 17.299578\n", "1 Femmes non fumeuses 180 19 9.547739\n", " Dans la tranche d'age ]54:64] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 64 51 44.347826\n", "1 Femmes non fumeuses 81 40 33.057851\n", " Dans la tranche d'age ]64:89] Nb_Vivantes Nb_Mortes Mortalité en %\n", "0 Femmes fumeuses 7 42 85.714286\n", "1 Femmes non fumeuses 28 165 85.492228\n" ] } ], "source": [ "print (Resume1)\n", "print (Resume2)\n", "print (Resume3)\n", "print (Resume4)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAFhCAYAAAB6RLH1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3XmcHFXV//HPISxhX0MIYV8EkU2IKLITUBBkEYhQIBHhiQqC8IgCIvIgLsgiiGyGzaAUEBAF/CEiQUBlDQKyi0BYQzLsIMh6fn+c26QzTDI9M1W91Hzfr1e/pru6u+7t0z2nqm7dutfcHRERqZ45Wl0BEREphxK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRSnBt6PcVie3d1tdjULktg25/bvu8aPktmED7xtCbn8kt717ed265HbzgOs5ULmdQW5fbnU1Wia3W8ltr5LWfTG5fa+UdVfcnK2uQEvl9nrdo/mAt4D30uOvkvmFza9Um8vtOGAJMt+vX+/PfOUG13UC8Hsyv6CXNf4IOL5fdZmd3OYGcmAr4K/AF8n8jfTcscCzZH5m3TuOB24gt1+T+XvdV1e63FYH7iPzav9P57YNcDCZb9PqqnSCav8YepP5Ah/cz20KsB+ZX9ey+rS73Jr3e8n8f3t9TW7LA58EdiqhBrsDrwHDgEuBrwCnkdtHgC2ATWd6deZTyO0pYFvgDyXUZ4bc5iTzvh/h9fd90rEGd4LvTW4bAT8DVgf+A1wCfJvM3+1xjym3W4HTyPw35HY+MDeZ75me+zmwCplv10M5cwInAXsBLwE/7/b8YsApwGeAd4BzgGPJ/P0e1nUcsCzx3W4H/BvYGdgbOBB4HdiHzP+SXr8ccBbwKeB54EdkPqFuXcsBQ4DPAV8H/hcwctsdeIDMNyC3rwKHACOBacCPyfy8WcT0OWBXYIlZrKvxzwqfBW4l83fq1r8scBqwEfAqcDyZn9UtNnMA2wOPA3uT+d09rHtF4AYyf4fcbgBWSst/ARwyi/rcQMT8wwm+9nuBbwBHAw78hMx/kZ6f3W9tKPAmsD9waIrL6t1KuAkYUndUugmx8RsDPAhkwEnkdhnxfa9NHK1eDXyDzF9L9XgO+Cmwb4rVH4jfy9vp+d2Ao4AViO/6a2Q+KZW5cvofWAP4G7Anmb+U3rcJcCKwGvAYcCCZ/72HGEJuGwBnEzG/Api7x9fFa88EPg8sBDwEHETmt6bnFgDGE7/dp4ELgX3JfJX0/Kx/KxWhNvjZe4f4h1yM+If5PNBo08RBwKfJbXdyGw3sQfzT9OQbwJbAWsCGwBe7PX8h8Arxg9+A2GP90mzK3pn4J14EeBi4nkgaSxEbkjPqXntpes0IIgmcnJJNzS7ABGBh4LdEEppA5guQ+QbpNVOJPdeFgK8Bp5Pbx2ZTP8j897NYV18+61qp7iG3IUTCuhlYGtgG+C65bdYtNuel2EwiNiY9uQ8YTW7zAJsD95PbHsBjZH7HLN7zILDOLJ6D2FBumD7bdsAx5LZxeq6R39r2wPrAx3tY96bAeymWC5D5XXXL7yY2qCelZT8gfgtrEQn3yG7r2hUYDaxCbCQyoJakxwPfJOI3Gniq7n0ZsCfxW1okvQ5yWwH4fSpnMeB7wO/JbdEPfYrYmP0e+GV67R+BHWaU4Nd0a565JX2OxYmNwaXkNld67ofEEdjyRLxn/I4a+610PO3Bz07mt9c9epTczgE2I5Jnb+99LZ0g/B3wX2JP57lZvHoMcBKZPwtAbscDl6f7yxP/pDukPdU3yO1UoglhwizWN6luD/0yoh35JDJ3crsYOJXc5gWWIRLS5mT+FjCZ3CYQ/wi1vasbyfzqdP9Ncuvps15Z9+g6crsR2Bi4fxb161nfP+siwKN1jzcGhpL5T9Pjf6Ujqd2BG9Oy68n8z6m8XwP7zKI2vyOS8R3E3ugVxAZhy/T9fIpInP9b1+zxWqrT7BxN5m8Cd5Hbb4gN/98a/K39iMxf7mX93T1G5men+28Se7kPpcfPkdsp1BLxDCeT+TQAcrsaWDct3w8464PfFjzZ7X1nk/mj6X2XMaMZayxweV3z59Xk9gBxlHZJt3VsCrxF5rWdkAvJ7Vuz/HT152hy+zGxEVmJ2PCPAfYg81eAV8jtDODg9OpGfisdTwl+dnJbg9jrWQ+Yl4hXz4eVPfsbsXc7L5EwZmVpZt4TeqLu/vLAUKCrLrnOQTS9zMq0uvtvAl1kH4wq92b6O38qtyslnPqyR9c9rq9Xz3LbgdgrWyXVbT7ixGRf9fWzvgQs2O39K5BbfRIcAtSfV6nfyL4BLEBPognm0A8e5/YL4GQi6a5O5pumBL0X8Kv0qgWB3hJw9+9547T+Rn5rvX8Xsy8PcluaaAL8dKrvHMRvtF73GC2R7i/L7L/XWcV2eWCP1LxTMxfx++tuaaI5pd4TPbwu5HYE8GXiiMSJ388S5PYvYDgzf/76+438VjqeEvzsnU20q+5G5q+T2+HE3jBEk8cQcpsn7f1C/Mjq/S9x6P0Osedw8izKmUr889QsV3f/KaLdfNG6JF2UZ4Fh5DZvXZJfDnim7jXdy5z5cW7zE808uwJ/TG3G1wA97Op/SPd19/Wz/hPYsdv7HyLztRp4b+NyW49oBjgI+D4wOT1zB9GWXfNR4J5e1rYsMCXdX474DmD2v7Wa2cVkVs91X34C8dtdk8xfSuc/fthLnWueAlbu9VU9v+8cMj+wgddOJY4s6y0H3PmhV+a2NXFeaSuiecyIoyhLR6vT07oeS++o/x8r57fSZtQGP3sLAq+kf7iPAf9T99yzQBewZ+qzvT9xkjHktiaxV7tXun0/7aX1ZCJwCLmNILclgO988EzmjwO3AseT24LkNge5rVrXdjsQ/yaS5A/JbZ6UyMYS7eCzMg1YkdxqCXxeYm9sOvB+2pvfvMHyZ15X3z/rn4BP1rW5/g2A3A4mt6HkNie5rZ0+V//kNgdxIu7AtNF5HNiU6Ea5KTOSB8Te/R97WePR5DYvua1DNIXVmihm91trxHRih2O5Xl63ILERfTW9tvfeSjOcA3yV3DZN382yRK+i3kwAdiO30el/Zd50v/sOEcTJ4qHk9rX0/e3BzBvR7p/lHeL/cG7i3MLQuucnAkeS28Lps3697rnifyttSAl+9g4B9ks9E06nvr0w+jrvR/SIeJ7YO4i9jPjn/w1wDJk/QOYPED++X9clo3qnEYe+9wO3ET/MensQbbsPAS+megwf8KeLhDWG6PXwHDN6bszuMPxiognmRXK7mcyfJ5oyrgJeIE6KXj2b9896XaHxz5r5U0S8Ppcev5Puf5o4rO8CzmRWzTCN+RpwC5nfW1fnl9O65yFO2NbOHyzP7D/7e6m+jwPXAD8g85vSc7P+rTUieqscD9xJbi+T27qzeOX3iWahV4hmw9/2oYy/EvE4I71/Eh/e2+7pfY8RJ+uPIf5XniDa/T+cf+JIcmeix9BLxMnRq2ax5quIDcKjxIb2eeJ7qfleWscTxIZ3InGtS1m/lbZjmvBDOlokstPJfKNeX1tuPU4H7pxN99DBcSFSO8vtEGAbMv9sq6vSLPqxSWeLPuytTe5RjwNaXQXpJvq5jwRuJ86PfBM4rqV1arLGEnxu3yTaBI3oCnUKcUHKJcQFD1OAMR9c1CAi0nq1JrTliaaa3xDnEQaN3pto4mThxcRFJ28TbYdfJxL+i2R+XDrjvyiZH1ZudUVEpFGNnGT9KHE5+Bvpgo4biZMgOzLj4pMJlDMeiIiI9FMjCf4+olvY4uQ2H3HmeVlgOJnHBRLxd8nSaikiIn3Wext85g+S20+BPxP9Z+8BGh6RzszGAeMA5p9//vVXX737GEkiIjI7d9555/PuPqyv7+t7N8kY7+Fp4oz05mQ+ldxGECPvrTa7t44aNconT548u5eIiEg3Znanu4/q6/sau9AptyXT3+WALwAXAVcSVz2S/l7R18JFRKQ8jV7J+ts0+ttVwAGpO+RxwNbk9giwNYOsf6mISLtrrB985pv0sOwFZh51UERE2ojGohERqSgleBGRilKCFxGpKCV4EZGKUoIXEakoDRcsItVgjcwS2YseLvy0YwpYL+BHN3/uDe3Bi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFRUY1ey5nYIsB/gwL3APsB8wCXACsAUYEyaCERERNpA73vwuY0EDgJGkfmawBBgd+BwYBKZrwpMSo9FRKRNNNpEMycwL7nNSey5PwvsCExIz08Adiq+eiIi0l+9J/jMnwFOBJ4EpgKvkPm1wHAyn5peMxVYsrxqiohIXzXSRLMosbe+IrA0MD+57dVoAWY2zswmm9nkrq6ufldURET6ppEmmq2Ax8m8i8zfAS4HPg1MI7cRAOnv9J7e7O7j3X2Uu48aNmxYQdUWEZHeNNKL5kngU+Q2H/AmMBqYDPwHGAscl/5eUVYlRUSk73pP8JnfRm6XAf8A3gXuAsYDCwATyW1fYiOwW4n1FBGRPmqsH3zmRwNHd1v6FrE3LyIibUhXsoqIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhF9T7hR26rAZfULVkJ+D5wQVq+AjAFGEPmLxVeQxER6Zfe9+Azf5jM1yXzdYH1gTeA3wGHA5PIfFVgUnosIiJtoq9NNKOBR8n8CWBHYEJaPgHYqciKiYjIwPQ1we8OXJTuDyfzqQDp75I9vcHMxpnZZDOb3NXV1e+KiohI3zSe4HObG9gBuLQvBbj7eHcf5e6jhg0b1sfqiYhIf/VlD35b4B9kPi09nkZuIwDS3+kF101ERAagLwl+D2Y0zwBcCYxN98cCVxRVKRERGbjGEnxu8wFbA5fXLT0O2JrcHknPHVd47UREpN967wcPkPkbwOLdlr1A9KoREZE2pCtZRUQqSgleRKSilOBFRCpKCV5EpKKU4EVEKkoJXkSkopTgRUQqSgleRKSilOBFRCpKCV5EpKKU4EVEKkoJXkSkopTgRUQqSgleRKSilOBFRCqqsfHgc1sEOAdYE3DgK8DDwCXACsAUYAyZv1RGJUVEpO8a3YP/OXANma8OrAM8CBwOTCLzVYFJ6bGIiLSJ3hN8bgsBmwLnApD522T+MrAjMCG9agKwUzlVFBGR/mikiWYloAs4n9zWAe4EvgkMJ/OpAGQ+ldyW7OnNZjYOGAew3HLLFVFnERFpQCNNNHMC6wFnkvnHgf/Qh+YYdx/v7qPcfdSwYcP6WU0REemrRhL808DTZH5benwZkfCnkdsIgPR3eik1FBGRfuk9wWf+HPAUua2WlowGHgCuBMamZWOBK8qooIiI9E9j3SThQOBCcpsbeAzYh9g4TCS3fYEngd3KqaKIiPRHYwk+87uBUT08M7rQ2oiISGF0JauISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRTU2HnxuU4DXgPeAd8l8FLktBlwCrABMAcaQ+Uul1FJERPqsL3vwW5D5umRem/jjcGASma8KTKIPE3GLiEj5BtJEsyMwId2fAOw08OqIiEhRGk3wDlxLbneS27i0bDiZTwVIf5fs6Y1mNs7MJpvZ5K6urgFXWEREGtNogt+IzNcDtgUOILdNGy3A3ce7+yh3HzVs2LB+VVJERPqusQSf+bPp73Tgd8AGwDRyGwGQ/k4vpYYiItIvvSf43OYntwU/uA+fAe4DrgTGpleNBa4op4oiItIfjXSTHA78jtxqr8/J/BpyuwOYSG77Ak8Cu5VXTRER6aveE3zmjwHr9LD8BWB08VUSEZEi6EpWEZGKUoIXEakoJXgRkYpSghcRqSgleBGRilKCFxGpKCV4EZGKUoIXEamoxib8EJGBMxv4OtwHvg4ZNLQHLyJSUUrwIiIVpQQvIlJRSvAiIhWlBC8iUlFK8CIiFdV4N8nchgCTgWfIfHtyWwy4BFgBmAKMIfOXSqijiIj0Q1/24L8JPFj3+HBgEpmvCkxKj0VEpE00luBzWwbYDjinbumOwIR0fwKwU6E1ExGRAWl0D/4U4DvA+3XLhpP5VID0d8me3mhm48xssplN7urqGkhdRUSkD3pP8LltD0wn8zv7U4C7j3f3Ue4+atiwYf1ZhYiI9EMje/AbATuQ2xTgYmBLcvsNMI3cRgCkv9PLqqSIiPRd7wk+8yPIfBkyXwHYHbiezPcCrgTGpleNBa4oq5IiItJ3A+kHfxywNbk9AmydHouISJvo23DBmd8A3JDuvwCMLrpCIiJSDF3JKiJSUUrwIiIVpQQvIlJRSvAiIhWlBC8iUlFK8CIiFaUELyJSUUrwIiIVpQQvIlJRSvAiIhWlBC8iUlF9G4tGRNqOHWMDXocf7QXURNqN9uBFRCpKCV5EpKKU4EVEKqr3NvjchgI3AfOk119G5keT22LAJcAKwBRgDJm/VFpNRUSkTxrZg38L2JLM1wHWBbYht08BhwOTyHxVYFJ6LCIibaL3PfjMHXg9PZor3RzYEdg8LZ9AzPR0WNEVFBGR/mmsDT63IeR2NzAd+DOZ3wYMJ/OpAOnvkj291czGmdlkM5vc1dVVTK1FRKRXjSX4zN8j83WBZYANyG3NRgtw9/HuPsrdRw0bNqyf1RQRkb7qWy+azF8mmmK2AaaR2wiA9Hd6wXUTEZEB6D3B5zaM3BZJ9+cFtgIeAq4ExqZXjQWuKKeKIiLSH40MVTACmEBuQ4gNwkQy/wO53QJMJLd9gSeB3Uqsp4iI9FEjvWj+CXy8h+UvAKOLr5KIiBRBV7KKiFSUEryISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVJQSvIhIRSnBi4hUlBK8iEhFKcGLiFSUEryISEUpwYuIVFTv48HntixwAbAU8D4wnsx/Tm6LAZcAKwBTgDFk/lJpNRURkT5pZA/+XeBbZP5R4FPAAeS2BnA4MInMVwUmpcciItImek/wmU8l83+k+68BDwIjgR2BCelVE4CdyqmiiIj0R9/a4HNbgZi+7zZgOJlPBUh/lyy2aiIiMhCNJ/jcFgB+CxxM5q82+jYzG2dmk81scldXVz+qKCIi/dFYgs9tLiK5X0jml6el08htRHp+BDC9p7e6+3h3H+Xuo4YNGzbwGouISEN6T/C5GXAu8CCZ/6zumSuBsen+WOCKwmsnIiL91ns3SdgI+BJwL7ndnZZ9FzgOmEhu+wJPAruVU0UREemP3hN85n8DbBbPji60NiIiUhhdySoiUlGNNNHIYGCzOkjrI/di1iMiA6Y9eBGRilKCFxGpKCV4EZGKUoIXEakoJXgRkYpSghcRqSgleBGRilKCFxGpKCV4EZGKUoIXEakoJXgRkYpSghcRqSgleBGRiup9NMnczgO2B6aT+Zpp2WLAJcAKwBRgDJm/VFYlRUSk7xrZg/8VsE23ZYcDk8h8VWBSeiwiIm2k9wSf+U3Ai92W7ghMSPcnADsVWy0RERmo/rbBDyfzqQDp75KF1UhERApR+klWMxtnZpPNbHJXV1fZxYmISNLfKfumkdsIMp9KbiOA6bN6obuPB8YDjBo1SvO5DTJ2TDFTAfrR+umI9FV/9+CvBMam+2OBK4qpjoiIFKWRbpIXAZsDS5Db08DRwHHARHLbF3gS2K3EOoqISD/0nuAz32MWz4wutioiIlIkXckqIlJRSvAiIhWlBC8iUlFK8CIiFaUELyJSUUrwIiIVpQQvIlJRSvAiIhWlBC8iUlFK8CIiFaUELyJSUUrwIiIV1d/x4JvPihlXHJ95XHGNVy4iVaU9eBGRilKCFxGpKCV4EZGKGlgbfG7bAD8HhgDnkPlxRVRKREQGrv978LkNAU4HtgXWAPYgtzUKqpeIiAzQQJpoNgD+TeaPkfnbwMXAjsVUS0REBmogTTQjgafqHj8NfLL7i8xsHDAuPXzdzB4eQJm9WQJ4frav6F93y17Xa/9Xznr7qXXrVXzLXW97xbehdXfUevvfHbvMGAMs3583DSTB91TbD3UGd/fxwPgBlNMwM5vs7qO0Xq1X6y1/vWWuW+stxkCaaJ4Glq17vAzw7MCqIyIiRRnIHvwdwKrktiLwDLA7kBVSKxERGbD+78Fn/i7wDeBPwIPARDK/v6B69VdZTUFar9ar9TZ33VpvAcxdY6iIiFSRrmQVEakoJXgRkYpSghcRqSgl+MTMhqS/o8xsoyaW92kz27zs8lpN8S2X4lu+Zsa4qPgqwQNmZu7+Xnp4BikuZja8zPLMbE5iPJ830/Jl07JKUXzLpfiWr5kxLjK+SvCAp65EZvYt4FZ3/6uZbQecb2a3mdlKZZQHHApc5+63mdkuwHnA7Wb2kSLLazXFt1yKb/maGeMi41vJrW1/mJkB8wHzmdnJRGzGA2ul22MllAcw3MzOAV4HjicGcVsL+FeR5bWa4lsuxbd8zYxxUfHVHnyStppnEIEbCRzp7r8HPgu8VVJ5vwRuI76Hw9z9z8DngReLLq/VFN9yKb7la2aMi4rvoL7QycyGpLaulYCPAfMDd7v7Q+n57wMfd/edCy5vdWBtYF7gLnf/Z3r+B8Dq7j6miPJaTfEtl+JbvmbGuIz4Duo9+LqTJqcBGwLfIiYwwcyWAm4F9i+pvNWBbwJbpPJGAJOArxdVXqspvuVSfMvXzBiXEl93H9Q3YAzw63T/AWBkur8dsGAJ5WXA2en+fcCS6f4XgIVaHQ/Ft7Nuim+1Ylx0fAf1HnzyKnCfmZ0KXOTuz5jZZsD/AW+UUN7zwJNmdi4wwd2nm9lngCPc/dUSyms1xbdcim/5mhnjQuOrBA+3EO1d2wHXpmVHAGf5jEOmIt0MrARsRBzeARwOnFJCWe1A8S2X4lu+Zsa40PgOupOsZjaHu79vZssBnwKuANYBdgFWJqbeetjdv1pweasQX9rFxCTluwKrAEsSJ20OKaK8VlN8y6X4lq+ZMS47voOuH7y7v5/ungrc5u5vmdkDwDzEHLPvAtNLKO8U4PpU3uPEOPq/JLpXvVBUea2m+JZL8S1fM2NcenxbfQKjFTdgU+D2dH8D4G9ADny2pPK2BG5J9zcjZsO6BPhcq2Oh+HbeTfGtVozLjO+ga6IBMLOPAvsR/UznB/4fsBCwFfAld3+n4PJWIbpSLUpMVn4xsBSwPbC7u79bZHmtpviWS/EtXzNjXGZ8B00TTd1FBCPc/UEze4QI4iXp8WnAfUV9cXVtayu6+7/N7G6iPe237v54uvz4tqr8cyi+5VJ8y9fMGDctvq0+FGrmjdig3Q2cDyxat3wH0iFSweXNDdwDTASWrlu+C3BTq+Oh+HbWTfGtVoybEd9B1U3SY2u4DTH85p1mdlR6ajIlXIHn7m8TbXlTgL+b2fEW4zzfABTSy6GdKL7lUnzL18wYNyW+rd5iNmGLPEfdlnmuuuXrEv1MHwZGlVDe3MDcdctXBW4izsJv3Oq4KL6dcVN8qxXjZse38nvwPqMb0tHA3mY2zMzmdPe7gWOBe0kD6hdc3o+AcWa2lJnN5e6PpGU3ANOKKq/VFN9yKb7la2aMmx3fyid4M6t9xn8R4zn8DNjIzOYnLmK4wd3vL6G8vxMDBZ0KjDazxYHPEO14jxRVXqspvuVSfMvXzBg3O76V7SZpZubpw1lMc7Wkuz9rZgcR3Y+eAUYBm7v7gC/U6FbeAsBSHmfH9wV2A54jhhvd0t1fG2h5rab4lkvxLV8zY9yq+FY5wde6PB0MrEZc9nuXu3/HYpjP+YDX3b2QK9Lquj0dASyfyrvP3Q82s0VSeW+6+0tFlNdqim+5FN/yNTPGrYpvZZto0hc3DNgb+CnwNlA79FkMmFrUP0cq730zG0lsjY8B3gHuSk+vALxSpX8Oxbdcim/5mhnjVsW3sgk+WR+YQJyxXsDdz07LvwcsXUJ5HwXOJQYjmtvdJ6TlRwGlzHDfYopvuRTf8jUzxk2Pb9UT/L3AzkT3oyMAUvvaIu7+aAnlPUAM2H9jXXnfIbpGFTrpcZtQfMul+JavmTFuenwr2wZfY2YbE1vjd4gJbHcB9nT3B0oqbxRwJLAgMWjQZ4AxJf1DtpziWy7Ft3zNjHGz41upBF930mQbYqbzR4EHgWeB9YjBgm5293sKLm9nYp7GR4D7gceJkygLA3e4+8NFlNdqim+5FN/yNTPG7RDfSiX4GjO7nZh5ZV7AiS/vqjL676buVbcAlxKjwc1FjBX9R+D+ugsbKkPxLZfiW75mxbjV8a1Mgq/1M7UY5vNL7v7d1P1oK+JEynDgxKIOu+rKWx3Yy92/ly6MGE2MHz0SON7dHyyivFZTfMul+JavmTFum/h6G4wFMdAbM8Z3WBI4kLgibae651cCdi2hvGWJ+RKfJH4wtedHAtu3Oi6Kb2fcFN9qxbid4luZPXgAM8uBV4CpwMHE5cDj3H1qSeVNJL68J4gTJw8CX3X3f5VRXqspvuVSfMvXzBi3RXxbvWUtYGtZ20itDFxWt3xO4EzgfeDLJZS3GnAZaWudlh2fyvufVsdF8e2Mm+JbrRi3W3wrswdvZocAXyYmqz3X05lpM1uNuErsuYLL+xawF/AX4Dx3vy8tXxr4r7u/WGR5rab4lkvxLV8zY9wu8e3oBF83vsO8xJZxB2KEtheBfwB/d/dptRMeBZa3EPA68Dmiq9V/iFlgbnb3J4sqr9UU33IpvuVrZozbMb4dneAh+poCVxCznp9EnAkfQ4zMdq+7n1JweXMBVwJ3Aj8hBgkaQ5yFv8/df1Zkea2m+JZL8S1fM2PcbvGtQoKfG9gE2J0Ype00d7/SzDYkDoXumu0K+l7eXEQ3pz2IsSXOcfeLzGxd4N3aoVhVKL7lUnzL18wYt1t8OzLBW8y28m7dYwMWBzYDDiLGVj7c3R8vsbxFgA2B7xCHYN9y94eKKK/VFN9yKb7la2aM2zm+HTnYWC2YZnasmW3m4Xni0OgK4uq0+YsoK7WX1co7xcy2T+UQHLIyAAARbUlEQVS9BFxDnCl/g2jfqwTFt1yKb/maFeN2j2/HJXgz28fM1k4P3wXONbOzzWxpd3+H6Pp0XYGHQl9Lh1cQfVpPNLPczFb2uMx4MeBGr0jfYcW3XIpv+Zoc4/aOr7dBP9VGb8DawO3AvHXLFiXGWH6Q6NP6ODCioPJGEaPL1c+0Pg/wc2Km9XOJAYSWaHVsFN/2vym+1YpxJ8S35V9IHwN6FfCVdP+zwP51z62Vln2kwPL+RIwjAdG96pt1z60EbAos3+q4KL6dcVN8qxXjTohvy7+QPgTz08Qlvxulx38HPpfuz1lCeZsAj6UvaV7gVmJCXOq32FW5Kb6Kb6ffmhnjTolvx/SiMbO1iK5H7xFb4oXdfYu654e4+3sFlvdR4IvE8J7rAkPdfXRZ5bWa4lsuxbd8zYxxp8S3YxI8gMVwm1sDWxJtXTcCk72kExip/+zoVN6CxFZ6slewrzAovmVTfMvXzBh3Qnw7JsGn7kie7i8FfJ64kOAt4B7g0oL3gOrLWxzYntgrcOCfQN4OW+iiKL7lUnzL18wYd0p8O6abpLt7uoAAd3/OY/bz84kvj6KDWSsvfZEveMyAfg7wEvBmO3x5RVJ8y6X4lq+ZMe6U+HbMHny9bltPI4bkLC2gZjaXR//ZD5VfRYpvuRTf8jUzxu0c37bfgzezOdLfBSzmN6Tui5vDQ1GHXbWyFjezuVJ7HrUvz8z2M7Pl2+XLK4LiWy7Ft3zNinEnxrftE7zPmJT2aGIsiZ6eGzCbMdTnMsDlxKS43zCzLc1s4fTlvuXuTxRVZjtQfMul+DZFLaH+H7DETE8UFONOjW/bNtGY2RrAw7Utr5lt4u5/rXt+QXd/rYRyTwGeAW4mTpwsDNwP3ODu9xddXqsovuVSfMtnZgsQSbW2B72+u99Z93zhMe60+M7Z6gr0xMx+TMx0fr6Z3ezu93T759gZWAD4dcHlrkFcfvwjd+8ys9uIK9R2Jk7UtN0X2B+Kb7kU3/KZ2WHEFHyfMLNrgWuJnjK15wuPcSfGty0TPPBXYCNiBvTMzJYnRmN7292vJWZi+W8J5Y4AFgKuMrPvuvv1wOVm9lfgzRLKaxXFt1yKb4nMbCSwD3GB0QjgNGKDep6Zne3ubwOTgbcLLrrj4tt2TTSpLWth4ku7hQjc2sBY4BjgVHd/r6gz1d3Xk348OxIXMDwBnO3uD7bTmfGBsJjdZiEU31IovuUzszHA7u7+hfR4aeCXRFv8f4B93L2QDWinx7ftTrK6+/seYymfBQx393OBocBTwEjgwAL/OeZwdzezOc1sFzM7CViT2Pofl152VLt+ef3h7u+l+J6J4ls4xbcprgZeM7O9zWwb4AfAJHffgZgLdZEiCqlCfNtuD77GzOYBTgSuB04grkpbEXjR3W8tqIzamfEzicH/XyUub55GDPn5JrCQuz9bRHmtZmYLufur6f5cwM9QfAtjZqu5+8Pp/txEfCeh+BamlkzNbDSwC/G5nweOcPe3zWwScLK7/6GAsjo/vt4GI56ljcymwJeAHwMrpWXbAFOAH5dY7jBitvPa42WAC9PNWh2XAj/nHsDe3ZZ9hjjMVHwH/jn3IZpl6sch34IYe/xHim8hn3U74PtATkycvVhaPlf6uy8xFozim25t0URjZkOJvZ1lib7C61tcsHAdsIW7fze9roz6vgM8Z2ZfS3u4T7v7nsQkAYUc6rVaiu/hpF4GZjYktVveAGyu+A5MOto8Ghjv7m9aXAizFXAfsJO7H5lep/j2U/oN/4TY4XuCGFBsqJmNcPd3zMyINvhDCi66o+PbLr1ojiW2kj82s92Ao4g9+k8AE4AzraThN939ZTP7FfA54GUz+y/RFepNj7bUKvgh8KS732MxMNIJxOh3qxBtxeMV3wH5GHCnu//TzFYgxj95gficlwD3pKaFwuflHCTxBfg2sXd+gZltSuxB3wBsb2ZnpI3oeUUX2unxbXmCN7MFiZNQP0iLtiX6tB4PrEHMeZi7+ysFlVc/RsW6wFLufqWZvUPsFYwEuoBxRZTXamZWm1z4CTPbntjDuYnYcC5PxPciL+iCkMEW3+Qh4C0z25KYdOIP7n5SOkq6wLpdgDMQgzS+AHcA85vZKkQzza/c/SgzWxH4hZmt4u7/HmghVYtvW5xkNbOF3f0VM5sP2MXdf1333NXAz939TwWVVTtxcgQxfdei6alx7n5bes2cnmZKrwKLy6s/SySfld1987rn/kiclLq2oLIGXXwBzGwX4hzSC0Q/+AtT00EOXOvuvyqonMEa32WIOU6fJE5WH+Lu96bnrgAucfe8gHKqFd9WnwSY3Y3oknQHMRJcEeurbdCGAteQJt4FvgpMJfZsl2315y45nhvUPV5L8S00vrsTkz7cBOxGnPdQfIuN8VDiytEJwFJE00khMa5ifNviJCt8+ASUmS1JdEX6hRfUdunp2wJ2Ja5yWzltsX/p7iOAfwPrF1FWO7HE3e9z99vTsvmBnxIX3ii+A5BO8OHuFwMbE+OCr0T0yT5U8R241DHAPC5gupJoJrmD2KieUESMqxjfdmiDn4OI7fvp8XeIblCvEbOiXFBCeYsRl47vBsxlZg96TBDwlSLLahfuMyZCqIvvK8BFXtccVoRBHN850m94f6Ip4W2vGyO8KIMxvhAXkNV+w8D+7n6omZ0AvOzubxVVTtXi25I2+BTERYg+w8/ULd8CON7dP9GEOqxE9JsdSUyxdStwa1F7W61U3yMm/VOYR7viZsBJ7j6qCXUYrPE92d3Xa0IdKhtf6DXGp7j7x0suvxLxbVUTzfHAycDVZrZd3fKtiAudaldaFq7WFOTuj3l0rTqfOOxaudO+vNkYb2YXm9lID7XPtQ0xHoriOzCzi+/R8MGYNIUbJPGF2cf4+1BOjKsW36bvwaeuR+cAXwDWA1YDngbm85hDsVn1qB1S134oc5RxSN1sFheInUB043qK6F1wdOpe9qK7v9ikeii+5dajkvGF9ohxVeLbij34HYCr3f1JYD6iX/ZSwP5m9nczG1Z0gbWtss08pdcHX57HAFEd9+X1xKP71onEXs4XgbXN7B7gYWZ0+SqU4qv4FqnZMa5yfFuR4K8D1jWz/YleHF9395NSm9p9wApFF1h3aNXTtGltMft5UdKPdSpxkdin3X1n4DlgOnBaOoIq1GCJb+qMNAR4Fvgoim8pUpt7M2M8uyn/Ojq+TU3w6bDnZuISY4j2refqXrI2MbhPUeWt0a2d7kp3n1b3/IJFldUuPIZbfh84DBhmZusQY/ysSnQrW6qosgZbfFNb8HupO93hKL6FM7P5U5xrMV6yrBinPfa5fEY79UXu/lzd8x0f36Z2k6ztibj7zwAsxnI+38wuILomvejuVxdRlrVo2rRWMbO1gaWJPZ8HgZeBuYG7gO+5++ukk1MFlTfY4vsN4E6iNwXAS8BcKL6FMbNvAy9aDE3yJhHj94gYH1VkjK0FU/61QlNOslpctHQ8cLq739HtuS2BA4hxya9298cLKnNbYg/geqKt/+/UTZtmMY3af+v3iDqVmY0AfkdMBrwyMNZjYLEhwGYeU4vNNM5GAWUOpvgOB24EtnH3KbU4Woz5vrHiO3ApxrcSo5s+YTGh9sLAEGCku9+SXjfgGFvMyjSJmaf8W4oYrOxsj3HllyVi3dHxbVYTzSlEj5nTzOys+hOp7n69u+/i7qcXmNznIH4sTxNjg/yLmGLrN8DH0omTJ4g2vSr4ITHA1S7ApcABZnYucChxQVPtZFFRyWcIgyu+PwbOS8l9deBoM7uKGOHwBVB8C7AjcENK7p8gPutJwBnEqLK1MWCKiPFGwAPu/t+Uc/6HOPL9LDDBzIa6+1OdntyhCQnezBYlxlT+InEJ8HvAJDM7PD2/kJmNTSdWCuFNnPav1SwGYVqdGE8fYsCrZ4D/R0yK8CUo9mSRN3FaulYzs6WIsU+Gp0XHETP7TCSaFb8Mim8B/kjkCYC9gKuA7xDXy6xpZvN5cQN8NWXKv3ZQeoJPP9QDgGfTVvGA9HhDi+m1/gksXtKP9XZgkdSeNhoYQxzy3lqVfw53fxrY1t3fSCeFJrj7Me5+OdEr4CNmtnJR5ZnZQnUPb6P68X0O+DiwrJm9CCzh7j/zGOLhWGA1iyFrC2Fmq9U9vB1YtMrxhQ96zXQRwwL8gziZeqm7P+nuk9LjDYsqK7XlXwB8ipjp7BXiSAFiDKHSr/RuGm/tyHBnAfcXuL6WTPvXrjciKdxa4PpaMu1fu9yIgcQ2KzG+LZn2r51uwJ7EoF63E73qdi0qxrRgyr9W31o6HryZ3Qic6O5XFbCuocDfgMuJiSyuI048Qgzx+Xh63QdXqFWZxdj6twCHufs1BaxvKLHHvrfPOIE7nJjweOQgjO/cRK+abxcU33mIC3l28JgZanHiyOEeYGl3r023WOn4pvNnI4G9iZ2HG4Dr3f3GAa53KLHROJHoX38/cTTk7j41HUXsAzzidb2VOl2rBhszwICtvLiJJk4A5nH3g2zGtH83kqb9c/fSpv1rNyn5rgp8xt1PLWidJwKrufvnrYdp/9y9tGn/2k36/S5DJOPTC1rnesCR7r6L9TDtn7sfVsF299kqcmNmZkcBK7r7V6zblH9Abcq/ymnJYGMe3i8wudem/Ts2LapN+3cs0dVsM4tZoyqffOCDk3QPAb8oYn324Wn/JhLNMt8GDga2MLMFB1F83eN8UiHJPamf9m8volfUrsQsXOtbTPs3aJI7zHQFbxHuAKbbzFP+fYno3bdOWl45bTPhx0B4zCf6PXfvSk0Tf3H3Q919urvfACxEnFAZVIpKCB4XnfyIuOBkZ2CIx4ncKenQeREKOgk2WLn7G8BvgYOAFYEXLK6yfJboDrlWK+tXAfcRTV6HpccTAVLT4nvABi2qV6naYk7WMpnZmsTh7ier3HbZLCme8/mMmaHWIi4QUXwLYGa7E0dFbxNHYCsDu6D4FiK1xW8L7EQk+/WIIbQrGd9KJfjubXbpCtqLiDb4QmeGGmxq1ynUHxVYTPt3KSXMDDXY1LevW4xomBFXWf4HuHegJxkHu3Re6n1393T/p8S1OX8hmsMmtrSCJalEgrcZ0/7V/kHqp/3b1eNCERmgukv066f920nJvRi1HRQzO4gSp/0brOp+vwe6+y/S8AiFTvnXbjo2wVsbTPtXZdYG0/5VWS/xbcq0f1XWS3xLn/KvXXTySdaWTfs3SLRs2r9BomXT/g0SLZnyr9105B68tcm0f1VlbTBlWpUpvuVSfGfo1D34pk/7N5h4C6alG0wU33IpvjN0aoJv+rR/g4m1YNq/wcJC06f9G0xSm7viSwcmeGvytH+DkTdx2r/BJrUHN23av8HGmjjlXydo6pR9RfAmTvs32FiTp/0bbKzJ0/4NNtbEKf86RcecZLUWTPs3mFgLpv0bTKwF0/4NJtbEKf86SScl+BxYE3iTGVvkrtbWqjospvh73N1/aGZHEkMuDyGmi7vO3e+0QTJaZBlSfB929+Mtpv3bHVifSEp/qG1MFd/+MbNxwIbuvo/FlH9HAv8lJs6+1t1PtZjyr6hZoTpCR7TBWwum/RtMrAXT/g0m1oJp/wahZk751zE6aQ9+AWIUw9ok0psQk0ovQDQpnFprl5e+M7OF3P1Vi6GXv+HuP6ktBy4GDnT3R1tayQ5mZssTfbO3IiZ83jgtX4QY9uEANS32T9qxm4eYw3YdotPF7u7+anr+L8APPab/G1Q6JsHPipmdBWzi7h9rdV2qyMxGE9PFDbrhlstgZhsTOyo3pseKb4HMbE/iSusXgf2AjwCHDtb4ViHBFzbtn8zMCp72T2ZmBU/7J+VN+depOjbB1wYQosBp/2QGK2HaP5kh/X4LnfZPZmYVn7+2ER2b4KU5Blu3MpEqUYIXEamojugmKSIifacELyJSUUrwIiIVpQQvIlJRSvAiIhWlBC8iUlH/H3F1M+Inc1m7AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Construire la série des valeurs pour la hauteur des barres :\n", "HauteursDesBarres = [Fumeuses_TrA1_TxMortalite,NonFumeuses_TrA1_TxMortalite,\n", " 0,0, # pour écarter les tranches 1 et 2\n", " Fumeuses_TrA2_TxMortalite,NonFumeuses_TrA2_TxMortalite,\n", " 0,0, # pour écarter les tranches 2 et 3\n", " Fumeuses_TrA3_TxMortalite,NonFumeuses_TrA3_TxMortalite,\n", " 0,0, # pour écarter les tranches 3 et 4\n", " Fumeuses_TrA4_TxMortalite,NonFumeuses_TrA4_TxMortalite]\n", "# Construire la série des labels pour chacune des barres :\n", "LabelsDesBarres = ('18-34: Fumeuses', '18-34: non.Fumeuses',\n", " '','', # pour écarter les tranches 1 et 2\n", " '34-54: Fumeuses', '34-54: non.Fumeuses',\n", " '','', # pour écarter les tranches 2 et 3\n", " '54-64: Fumeuses', '54-64: non.Fumeuses',\n", " '','', # pour écarter les tranches 3 et 4\n", " '64-89: Fumeuses', '64-89: non.Fumeuses')\n", "# Choisir la position (? la largeur ?) pour chacune des barres\n", "barres_position = np.arange(len(LabelsDesBarres))\n", "# ??? barres_largeur = [0.1,0.3,3.0,3.5]\n", "#\n", "# Creation du graphique en barres :\n", "plt.bar(barres_position, HauteursDesBarres, color=['red','green','red', 'green','red','green','red','green','red', 'green','red','green','red','green'])\n", "# ??? plt.bar(barres_position, HauteursDesBarres, width=barres_largeur, color=['red','green','red', 'green','red','green','red','green','red', 'green','red','green','red','green'])\n", "#\n", "# Mise en forme des labels et marques sur les axes horizontal et vertical :\n", "plt.xticks(barres_position, LabelsDesBarres, color='black', rotation=60)\n", "plt.yticks(color='orange')\n", "plt.title(\"Taux de mortalité (en %) par tranche d'age\", fontdict=None, loc='center', pad=None, color='orange')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les tableaux 'ResumeX' et le graphique montrant le taux de mortalité par tranche d'age permet de conclure que le tabac constitue un facteur de mortalité ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## IV - Prolongement de l'activité avec une regression logistique " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Afin d'éviter un biais induit par des regroupements en tranches d'âges arbitraires et non régulières, on peut essayer de réaliser d'autres analyse (par exemple, une régression logistique). \n", "Le but est d'analyser ''la corrrélation'' entre les 2 variables 'Mortalité' et 'Age' pour étudier la probabilité de décès en fonction de l'âge, et ceci selon que l'on considère le groupe des fumeuses ou des non fumeuses. \n", "\n", "Avertissements : \n", "* la régression logistique porte assez mal son nom , car il ne s’agit pas à proprement parler d’une régression au sens classique du terme (on essaye pas d’expliquer une variable quantitative mais de classer des individus dans deux catégories). \n", "* par ailleurs, la fonction LogisticRegression de Scikit Learn ne fournit pas les valeurs p-value et les intervalles de confiance pour les coefficients du modele, meme si cela est theoriquement possible de les avoir si la regression est appliquée sans penalisation ; il faut privilégier l'utilisation de la fonction de la librairie \"statsmodels\"." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\n", "#\n", "import statsmodels.api as sm" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
2NoDead57.5
3NoAlive47.1
4YesAlive81.4
5NoAlive36.8
6NoAlive23.8
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "2 No Dead 57.5\n", "3 No Alive 47.1\n", "4 Yes Alive 81.4\n", "5 No Alive 36.8\n", "6 No Alive 23.8\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rappel : OriginalInputData = pd.read_csv(\"../__DataSets/module3_Practical_session_Subject6_smoking.csv\")\n", "OriginalInputData.head(10) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IV - A/ Operons une Regression Logistique sur l'ensemble des 2 groupes (fumeuses + non fumeuses)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "verification : finalement countF= 582 ? 582\n", "verification : finalement countNF= 732 ? 732\n", "NbDecesChezFumeuses= 139\n", "NbDecesChezNonFumeuses= 230\n" ] } ], "source": [ "# Pour la suite, stockage dans des DataFrames separes pour les Fumeuses et pour les NonFumeuses. \n", "# Creation de Y_pourEtudeMortaliteFumeuses et Y_pourEtudeMortaliteNonFumeuses en comptant >0 (resp. <0) si 'Dead' (si 'Alive') \n", "# convertion des valeurs de la colonne 'Smoker' qui valent 'Yes' ou 'No' en entiers 1 ou 0 \n", "# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "X = pd.DataFrame()\n", "Y = pd.DataFrame()\n", "X_pourEtudeMortaliteFumeuses = pd.DataFrame() \n", "X_pourEtudeMortaliteNonFumeuses = pd.DataFrame()\n", "Y_pourEtudeMortaliteFumeuses = pd.DataFrame() \n", "Y_pourEtudeMortaliteNonFumeuses = pd.DataFrame()\n", "countF = 0\n", "countNF = 0\n", "sommeDesYi = 0\n", "NbDecesChezFumeuses = 0\n", "NbDecesChezNonFumeuses = 0\n", "taille1Colonne = OriginalInputData['Smoker'].shape[0] # devrait etre egal à 1314\n", "for i in range(taille1Colonne) :\n", " if OriginalInputData.loc[i,'Smoker'] == 'Yes' : \n", " X.loc[i,'Smoker'] = 1 # conversion en entier\n", " X_pourEtudeMortaliteFumeuses.loc[countF,'Smoker'] = 1 \n", " X_pourEtudeMortaliteFumeuses.loc[countF,'Age'] = OriginalInputData.loc[i,'Age'] / AgeMax\n", " if OriginalInputData.loc[i,'Status'] == 'Dead' : \n", " Y_pourEtudeMortaliteFumeuses.loc[countF,'Status'] = 1 # la caracteristque Death est comptée POSITIVEMENT\n", " NbDecesChezFumeuses += 1\n", " else : \n", " Y_pourEtudeMortaliteFumeuses.loc[countF,'Status'] = 0 # comptée nulle plutot que NEGATIVEMENT\n", " countF += 1\n", " else : \n", " X.loc[i,'Smoker'] = 0 # conversion en entier\n", " X_pourEtudeMortaliteNonFumeuses.loc[countNF,'Smoker'] = 0\n", " X_pourEtudeMortaliteNonFumeuses.loc[countNF,'Age'] = OriginalInputData.loc[i,'Age'] / AgeMax\n", " if OriginalInputData.loc[i,'Status'] == 'Dead' :\n", " Y_pourEtudeMortaliteNonFumeuses.loc[countNF,'Status'] = 1\n", " NbDecesChezNonFumeuses += 1\n", " else : \n", " Y_pourEtudeMortaliteNonFumeuses.loc[countNF,'Status'] = 0 \n", " countNF += 1\n", " X.loc[i,'Age'] = OriginalInputData.loc[i,'Age'] / AgeMax # si on veut avoir les ages dans l'intervalle [0:1]\n", " if OriginalInputData.loc[i,'Status'] == 'Dead' :\n", " Y.loc[i,'Status'] = 1\n", " else :\n", " Y.loc[i,'Status'] = 0\n", " sommeDesYi += Y.loc[i,'Status'] \n", "print (\"verification : finalement countF=\",X_pourEtudeMortaliteFumeuses.shape[0],\" ? \",Nb_Fumeuses)\n", "print (\"verification : finalement countNF=\",X_pourEtudeMortaliteNonFumeuses.shape[0],\" ? \",Nb_NonFumeuses)\n", "print (\"NbDecesChezFumeuses=\",NbDecesChezFumeuses)\n", "print (\"NbDecesChezNonFumeuses=\",NbDecesChezNonFumeuses)" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerAge
count582.0582.000000
mean1.00.492433
std0.00.180399
min1.00.200222
25%1.00.348165
50%1.00.479422
75%1.00.624861
max1.00.992214
\n", "
" ], "text/plain": [ " Smoker Age\n", "count 582.0 582.000000\n", "mean 1.0 0.492433\n", "std 0.0 0.180399\n", "min 1.0 0.200222\n", "25% 1.0 0.348165\n", "50% 1.0 0.479422\n", "75% 1.0 0.624861\n", "max 1.0 0.992214" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_pourEtudeMortaliteFumeuses.describe()" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerAge
count732.0732.000000
mean0.00.554125
std0.00.232462
min0.00.200222
25%0.00.348999
50%0.00.538376
75%0.00.732481
max0.01.000000
\n", "
" ], "text/plain": [ " Smoker Age\n", "count 732.0 732.000000\n", "mean 0.0 0.554125\n", "std 0.0 0.232462\n", "min 0.0 0.200222\n", "25% 0.0 0.348999\n", "50% 0.0 0.538376\n", "75% 0.0 0.732481\n", "max 0.0 1.000000" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_pourEtudeMortaliteNonFumeuses.describe()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Remarque : la comparaison des estimations de l'age moyen des 2 groupes est une 1ere indication" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "verif : somme des valeurs dans la colonne 'Smoker' = 582.0\n" ] } ], "source": [ "print (\"verif : somme des valeurs dans la colonne 'Smoker' = \",X['Smoker'].sum())" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerAge
01.00.233593
11.00.214683
20.00.639600
30.00.523915
41.00.905451
\n", "
" ], "text/plain": [ " Smoker Age\n", "0 1.0 0.233593\n", "1 1.0 0.214683\n", "2 0.0 0.639600\n", "3 0.0 0.523915\n", "4 1.0 0.905451" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Vérifions le contenu de la sous-liste de données qui contient 2 données binaires 'Dead' ou 'Alive' :\n", "X.head(5) # Y.describe()" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Status
00.0
10.0
21.0
30.0
40.0
\n", "
" ], "text/plain": [ " Status\n", "0 0.0\n", "1 0.0\n", "2 1.0\n", "3 0.0\n", "4 0.0" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Vérifions le contenu de la sous-liste de données qui contient 2 données binaires 'Dead' ou 'Alive' :\n", "Y.head(5) # Y.describe()" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Avertissement : \n", "* SciKit-Learn décide par défaut d’appliquer une régularisation sur le modèle. \n", "* Dans le modèle que l'on va utiliser, on applique une pénalité de type 'l2' et on prend un solver du type Newton qui est le plus classique pour la régression logistique." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### * Appliquons le modele de Regression Logistique de 'scikit learn' sur l'ensemble des 2 groupes Fumeuses et Non Fumeuses :" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "verif : somme des valeurs dans la colonne Y = 369.0\n" ] } ], "source": [ "print (\"verif : somme des valeurs dans la colonne Y = \",sommeDesYi)" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] } ], "source": [ "# Rappel : la variable 'Smoker' represente ici A LA FOIS les fumeuses et non-fumeuses !!\n", "if abs(sommeDesYi) == 0 : # on verifie que Y contient plus de 1 classe\n", " print (\"Probleme : la somme sommeDesYi = \",sommeDesYi,\" devrait être différente de 0 !\")\n", " print (\" les valeurs de Y ne composent qu'une seule classe !\")\n", "else : # La regression logistique peut être effectuée :\n", " SKL_MRL_A = LogisticRegression(penalty='l2',solver='newton-cg')\n", " SKL_MRL_A.fit(X,Y)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.14128009, 7.3116638 ]])" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Afficher la valeur des coefficients pour ce modele : \n", "SKL_MRL_A.coef_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La valeur du 1er coeff indique l'intensité de la probabilité de la mortalité en fonction de la caractéristique Tabagisme\n", "La valeur du 2nd coeff indique l'intensité de la probabilité de la mortalité en fonction de la caractéristique Age" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [], "source": [ "# Si souhaité, afficher dans un DataFrame les coefficients du modèle , avec la constante :\n", "#pd.DataFrame(np.concatenate([SKL_MRL_A.intercept_.reshape(-1,1),\n", "# SKL_MRL_A.coef_],axis=1),\n", "# index = [\"coef\"],\n", "# columns = [\"constante\"]+list(X.columns)).T" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Regression Logistique de SciKitLearn sur l'ensemble des 2 groupes fumeuses + non-fumeuses : score= 0.851\n" ] } ], "source": [ "score__SKL_MRL_A = SKL_MRL_A.score(X,Y)\n", "print (\"Regression Logistique de SciKitLearn sur l'ensemble des 2 groupes fumeuses + non-fumeuses : score= %5.3f\" %score__SKL_MRL_A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### * Appliquons le modele de Regression Logistique de 'statsmodels' sur l'ensemble des 2 groupes Fumeuses et Non Fumeuses : " ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Attention : \n", "par defaut, le modele de regression logistique de 'statsmodels' n'inclue pas d'interception avec une valeur cte ;\n", "pour inclure cette option d'interception dans le modele, \n", "utiliser l'instruction 'statsmodels.tools.add_constant' pour ajouter la constant dans la matrice X\n", "( Remarque : pour comprendre l'utilité et la mise en oeuvre de cette notion, voir le lien suivant :\n", "https://stats.stackexchange.com/questions/440242/statsmodels-logistic-regression-adding-intercept )" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.629624\n", " Iterations 5\n" ] } ], "source": [ "MLRavecSM_A = sm.Logit(Y, X).fit()" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: Status No. Observations: 1314
Model: Logit Df Residuals: 1312
Method: MLE Df Model: 1
Date: Sun, 12 Apr 2020 Pseudo R-squ.: -0.06046
Time: 17:50:18 Log-Likelihood: -827.33
converged: True LL-Null: -780.16
LLR p-value: 1.000
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
Smoker -1.1611 0.114 -10.206 0.000 -1.384 -0.938
Age 0.0040 0.120 0.034 0.973 -0.231 0.239
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: Status No. Observations: 1314\n", "Model: Logit Df Residuals: 1312\n", "Method: MLE Df Model: 1\n", "Date: Sun, 12 Apr 2020 Pseudo R-squ.: -0.06046\n", "Time: 17:50:18 Log-Likelihood: -827.33\n", "converged: True LL-Null: -780.16\n", " LLR p-value: 1.000\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Smoker -1.1611 0.114 -10.206 0.000 -1.384 -0.938\n", "Age 0.0040 0.120 0.034 0.973 -0.231 0.239\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# MLRavecSM_A.params\n", "MLRavecSM_A.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IV - B/ Operons une Regression Logistique sur le groupe des fumeuses " ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerAge
count582.0582.000000
mean1.00.492433
std0.00.180399
min1.00.200222
25%1.00.348165
50%1.00.479422
75%1.00.624861
max1.00.992214
\n", "
" ], "text/plain": [ " Smoker Age\n", "count 582.0 582.000000\n", "mean 1.0 0.492433\n", "std 0.0 0.180399\n", "min 1.0 0.200222\n", "25% 1.0 0.348165\n", "50% 1.0 0.479422\n", "75% 1.0 0.624861\n", "max 1.0 0.992214" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Commencons par vérifier le contenu de X_pourEtudeMortaliteFumeuses et Y_pourEtudeMortaliteFumeuses\n", "X_pourEtudeMortaliteFumeuses.describe()\n", "#X_pourEtudeMortaliteFumeuses.head(10)" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Status
count582.000000
mean0.238832
std0.426736
min0.000000
25%0.000000
50%0.000000
75%0.000000
max1.000000
\n", "
" ], "text/plain": [ " Status\n", "count 582.000000\n", "mean 0.238832\n", "std 0.426736\n", "min 0.000000\n", "25% 0.000000\n", "50% 0.000000\n", "75% 0.000000\n", "max 1.000000" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Y_pourEtudeMortaliteFumeuses.describe()\n", "# Y_pourEtudeMortaliteFumeuses.head(10)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "taille1Colonne= 582\n" ] } ], "source": [ "taille1Colonne = X_pourEtudeMortaliteFumeuses['Age'].shape[0] # devrait etre egal à 582\n", "print (\"taille1Colonne=\",taille1Colonne)" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "verif : Nbre de deces chez les Fumeuses = 139 VS Nb_Fumeuses= 582\n" ] } ], "source": [ "print (\"verif : Nbre de deces chez les Fumeuses = \",NbDecesChezFumeuses,\" VS Nb_Fumeuses=\",Nb_Fumeuses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### * Appliquons le modele de Regression Logistique de 'scikit learn' sur le groupe des Fumeuses :" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] } ], "source": [ "if ( abs(NbDecesChezFumeuses) == 0 ): # on verifie que Y_pourEtudeMortaliteFumeuses contient plus de 1 classe\n", " print (\"Probleme : la somme Y_pourEtudeMortaliteFumeuses.sum() devrait être différente de 0 !\")\n", " print (\" les valeurs de Y_pourEtudeMortaliteFumeuses ne composent qu'une seule classe !\")\n", "else : # La regression logistique peut être effectuée :\n", " SKL_MRL_B = LogisticRegression(penalty='l2',solver='newton-cg')\n", " SKL_MRL_B.fit(X_pourEtudeMortaliteFumeuses, Y_pourEtudeMortaliteFumeuses)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-3.64964135e-15, 5.37484238e+00]])" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Afficher la valeur des coefficients pour ce modele : \n", "SKL_MRL_B.coef_" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Modele de Regression Logistique sur le groupe des Fumeuses : score = 0.813\n" ] } ], "source": [ "score__SKL_MRL_B = SKL_MRL_B.score(X_pourEtudeMortaliteFumeuses,Y_pourEtudeMortaliteFumeuses)\n", "print (\"Modele de Regression Logistique sur le groupe des Fumeuses : score = %5.3f\" %score__SKL_MRL_B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### * Appliquons le modele de Regression Logistique de 'statsmodels' sur le groupe des Fumeuses :" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.412727\n", " Iterations 7\n" ] } ], "source": [ "MLRavecSM_B = sm.Logit(Y_pourEtudeMortaliteFumeuses, X_pourEtudeMortaliteFumeuses).fit()" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: Status No. Observations: 582
Model: Logit Df Residuals: 580
Method: MLE Df Model: 1
Date: Sun, 12 Apr 2020 Pseudo R-squ.: 0.2492
Time: 17:53:54 Log-Likelihood: -240.21
converged: True LL-Null: -319.94
LLR p-value: 1.477e-36
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
Smoker -5.5081 0.466 -11.814 0.000 -6.422 -4.594
Age 7.9990 0.784 10.203 0.000 6.462 9.536
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: Status No. Observations: 582\n", "Model: Logit Df Residuals: 580\n", "Method: MLE Df Model: 1\n", "Date: Sun, 12 Apr 2020 Pseudo R-squ.: 0.2492\n", "Time: 17:53:54 Log-Likelihood: -240.21\n", "converged: True LL-Null: -319.94\n", " LLR p-value: 1.477e-36\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Smoker -5.5081 0.466 -11.814 0.000 -6.422 -4.594\n", "Age 7.9990 0.784 10.203 0.000 6.462 9.536\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MLRavecSM_B.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IV - C/ Operons une Regression Logistique sur le groupe des non fumeuses " ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "verif : Nbre de deces chez les Non Fumeuses = 230 VS Nb_NonFumeuses= 732\n" ] } ], "source": [ "print (\"verif : Nbre de deces chez les Non Fumeuses = \",NbDecesChezNonFumeuses,\" VS Nb_NonFumeuses=\",Nb_NonFumeuses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### * Appliquons le modele de Regression Logistique de 'scikit learn' sur le groupe des Non Fumeuses :" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] } ], "source": [ "if abs(NbDecesChezNonFumeuses) == 0 : # on verifie que Y_pourEtudeMortaliteNonFumeuses contient plus de 1 classe\n", " print (\"Probleme : la somme Y_pourEtudeMortaliteNonFumeuses.sum() devrait être différente de 0 !\")\n", " print (\" les valeurs de Y_pourEtudeMortaliteNonFumeuses ne composent qu'une seule classe !\")\n", "else : # La regression logistique peut être effectuée :\n", " SKL_MRL_C = LogisticRegression(penalty='l2',solver='newton-cg')\n", " SKL_MRL_C.fit(X_pourEtudeMortaliteNonFumeuses,Y_pourEtudeMortaliteNonFumeuses)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 7.04605784]])" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Afficher la valeur des coefficients pour ce modele : \n", "SKL_MRL_C.coef_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rappelons que la valeur du 1er coeff indique l'intensité de la probabilité de la mortalité en fonction de la caractéristique Tabagisme ; tandis que la valeur du 2nd coeff indique l'intensité de la probabilité de la mortalité en fonction de la caractéristique Age\n", "\n", "On note le coeff nul pour la probabilité de la mortalité en fonction de la caractéristique Tabagisme: ce qui est un résultat attendu , puisqu'il s'agit du groupe des non fumeuses ; \n", "tout se passe comme si la mortalité était seulement le fait de l'age (aucune autre cause n'étant considérée ici)." ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Modele de Regression Logistique sur le groupe des Non Fumeuses : score = 0.873\n" ] } ], "source": [ "score__SKL_MRL_C = SKL_MRL_C.score(X_pourEtudeMortaliteNonFumeuses,Y_pourEtudeMortaliteNonFumeuses)\n", "print (\"Modele de Regression Logistique sur le groupe des Non Fumeuses : score = %5.3f\" %score__SKL_MRL_C)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### * Appliquons le modele de Regression Logistique de 'statsmodels' sur le groupe des Non Fumeuses :" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.687904\n", " Iterations 4\n" ] }, { "ename": "LinAlgError", "evalue": "Singular matrix", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mLinAlgError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mMLRavecSM_C\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mLogit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mY_pourEtudeMortaliteNonFumeuses\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX_pourEtudeMortaliteNonFumeuses\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;31m# Mais erreur de type 'Singular Matrix Error' !\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# voir https://stackoverflow.com/questions/20703733/logit-regression-and-singular-matrix-error-in-python\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# En fait cette erreur est due au fait que\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m# tout se passe comme s'il y a redondance entre les 2 caractéristiques 'Age' et 'Mortalité' .\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m 1832\u001b[0m bnryfit = super(Logit, self).fit(start_params=start_params,\n\u001b[1;32m 1833\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1834\u001b[0;31m disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m 1835\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1836\u001b[0m \u001b[0mdiscretefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLogitResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbnryfit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m 218\u001b[0m mlefit = super(DiscreteModel, self).fit(start_params=start_params,\n\u001b[1;32m 219\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 220\u001b[0;31m disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m 221\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 222\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmlefit\u001b[0m \u001b[0;31m# up to subclasses to wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/statsmodels/base/model.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)\u001b[0m\n\u001b[1;32m 471\u001b[0m \u001b[0mHinv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcov_params_func\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxopt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretvals\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 472\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mmethod\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'newton'\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 473\u001b[0;31m \u001b[0mHinv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlinalg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mretvals\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Hessian'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mnobs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 474\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mskip_hessian\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 475\u001b[0m \u001b[0mH\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhessian\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxopt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/linalg/linalg.py\u001b[0m in \u001b[0;36minv\u001b[0;34m(a)\u001b[0m\n\u001b[1;32m 530\u001b[0m \u001b[0msignature\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'D->D'\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misComplexType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;34m'd->d'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 531\u001b[0m \u001b[0mextobj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_linalg_error_extobj\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_raise_linalgerror_singular\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 532\u001b[0;31m \u001b[0mainv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_umath_linalg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msignature\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msignature\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mextobj\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mextobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 533\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mainv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult_t\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcopy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 534\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/linalg/linalg.py\u001b[0m in \u001b[0;36m_raise_linalgerror_singular\u001b[0;34m(err, flag)\u001b[0m\n\u001b[1;32m 87\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 88\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_raise_linalgerror_singular\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflag\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 89\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mLinAlgError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Singular matrix\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 90\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_raise_linalgerror_nonposdef\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflag\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mLinAlgError\u001b[0m: Singular matrix" ] } ], "source": [ "MLRavecSM_C = sm.Logit(Y_pourEtudeMortaliteNonFumeuses, X_pourEtudeMortaliteNonFumeuses).fit() \n", "# Mais erreur de type 'Singular Matrix Error' !\n", "# voir https://stackoverflow.com/questions/20703733/logit-regression-and-singular-matrix-error-in-python\n", "# En fait cette erreur est due au fait que \n", "# tout se passe comme s'il y a redondance entre les 2 caractéristiques 'Age' et 'Mortalité' ." ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'MLRavecSM_C' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mMLRavecSM_C\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msummary\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'MLRavecSM_C' is not defined" ] } ], "source": [ "MLRavecSM_C.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IV - D/ Résumé des essais de régression logistique :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Construisons un tableau des observations :" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Groupes Nb_Vivantes Nb_Mortes Score_Mod.Regr.Logistique\n", "0 femmes fumeuses et non 945 369 0.850837\n", "1 femmes fumeuses 443 139 0.812715\n", "2 femmes non fumeuses 502 230 0.872951\n" ] } ], "source": [ "tableMRL = {\"Groupes\": ['femmes fumeuses et non', 'femmes fumeuses', 'femmes non fumeuses'],\n", " 'Nb_Vivantes': [Nb_FumeusesVivantes+Nb_NonFumeusesVivantes, Nb_FumeusesVivantes, Nb_NonFumeusesVivantes],\n", " 'Nb_Mortes': [Nb_FumeusesMortes+Nb_NonFumeusesMortes,Nb_FumeusesMortes, Nb_NonFumeusesMortes],\n", " 'Score_Mod.Regr.Logistique':[score__SKL_MRL_A, score__SKL_MRL_B, score__SKL_MRL_C]\n", " }\n", "ResumeMRL = pd.DataFrame(tableMRL, columns=[\"Groupes\", 'Nb_Vivantes','Nb_Mortes','Score_Mod.Regr.Logistique'])\n", "print (ResumeMRL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Commentons :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Commençons par rappeler que le même modele de régression logistique (de base mais très classique, sans inclure d'interception avec une constante, issu des librairies \"scikit learn\" d'une part \"statmodels\" d'autre part) a été utilisé pour les 3 types de groupes, en considérant tous ensemble les différents ages (afin de s'affranchir d'un biais induit par des regroupements en tranches d'âges arbitraires et non régulières) ; et que le score reflète la capacité de ce modèle de régression logistique à prédire la mortalité en fonction de l'age.\n", " \n", "De façon différente mais cohérente avec l'utilisation du modele de Regression Logistique issu de la librairie \"scikit learn\", les performances dans l'optimization (fit) du modele de Regression Logistique issu de la librairie \"statmodels\" terminée avec succes sont rappelées ci-dessous pour les 2 groupes traités séparemment :\n", " * Fumeuses : en 7 itérations : score~0.81, valeur de la fonction ~0.41 , coef probabilité Mortalité avec l'Age = 5.37\n", " * NonFumeuses : en 4 itérations : score~0.87, valeur de la fonction ~0.69 , coef probabilité Mortalité avec l'Age = 7.04\n", "(remarque : les coefficients de probabilté entre les variables 'Age' et 'Status' sont normalement positifs car ils reflètent le fait que l'on a compté POSITIVEMENT la mortalité ; or celle-ci augmente avec l'age). \n", "\n", "Les scores 'TRAINING r_score' (cf la partie V) qui refletent les performances dans l'optimization (fit) du modele de Regression Linéaire confirment les observations et interpretations faites avec les essais de Regression Logistique.\n", "\n", "Ceci s'interprete par le fait que, par-rapport aux femmes non fumeuses, les fumeuses ont une probalité plus grande de mourir en raison d'un autre facteur que l'age ; étant donné que dans cette étude , ne sont pris en compte que 2 facteurs explicatifs de la mortalité: le tabagisme et l'age, ces régressions logistiques permettent de conclure sur la nocivité du tabagisme." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## V - Prolongement avec un essai de Regression Linéaire séparemment sur les 2 groupes" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [], "source": [ "# Creation d'une fonction qui pourra etre appelée plusieurs fois : \n", "def buildLinearRegressionModel(X, Y):\n", " # step_2 = build a LinearRegression model on the subset of TRAINING data , and show its performance-score :\n", " from sklearn.linear_model import LinearRegression\n", " linearModel = LinearRegression(normalize=True).fit(X, Y)\n", " linearModel_trainingScore = linearModel.score(X, Y)\n", " print (\"LinearRegressionModel : TRAINING r_score = \",linearModel_trainingScore)\n", " #\n", " return (linearModel_trainingScore)" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LinearRegressionModel : TRAINING r_score = 0.25916536274977575\n" ] } ], "source": [ "# Testons le modele de Regression lineaire 'Statut Vivante_ou_Morte' VS 'Age' chez les Fumeuses :\n", "linearModel_trainingScore = buildLinearRegressionModel(X_pourEtudeMortaliteFumeuses,Y_pourEtudeMortaliteFumeuses)" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LinearRegressionModel : TRAINING r_score = 0.44864670335441903\n" ] } ], "source": [ "# Testons le modele de Regression lineaire 'Statut Vivante_ou_Morte' VS 'Age' chez les Non Fumeuses:\n", "linearModel_trainingScore = buildLinearRegressionModel(X_pourEtudeMortaliteNonFumeuses,Y_pourEtudeMortaliteNonFumeuses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sans donner d'explication sur la mise en oeuvre de cet autre modèle qui est appliqué à l'identique aux 2 groupes de femmes,\n", "ni sur la qualité toute relative des scores obtenus, \n", "on retrouve l'observation faite précédemment avec la regression logistique : à savoir, la probabilité de décès en fonction de l'âge pour le groupe des femmes non fumeuses est plus élevée que cette probabilité pour le groupe des fumeuses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## VI - Prolongement avec un calcul des corrélations séparemment sur les 2 groupes" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Correlations entre les variables pour les données concernant seulement le groupe des Fumeuses :\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeStatus
Age1.0000000.509083
Status0.5090831.000000
\n", "
" ], "text/plain": [ " Age Status\n", "Age 1.000000 0.509083\n", "Status 0.509083 1.000000" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print (\"Correlations entre les variables pour les données concernant seulement le groupe des Fumeuses :\")\n", "DataConcatenee = X_pourEtudeMortaliteFumeuses.drop(['Smoker'], axis=1)\n", "DataConcatenee['Status'] = Y_pourEtudeMortaliteFumeuses # on ajoute la colonne 'Status'\n", "# DataConcatenee # pour vérifier\n", "Fumeuses_Corrs = DataConcatenee.corr()\n", "Fumeuses_Corrs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print (\"Correlations entre les variables pour les données concernant seulement le groupe des Non Fumeuses :\")\n", "DataConcatenee = X_pourEtudeMortaliteNonFumeuses.drop(['Smoker'], axis=1) \n", "DataConcatenee['Status'] = Y_pourEtudeMortaliteNonFumeuses # on ajoute la colonne 'Status'\n", "# DataConcatenee # pour vérifier\n", "NonFumeuses_Corrs = DataConcatenee.corr()\n", "NonFumeuses_Corrs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sans explication sur l'application à l'identique aux 2 groupes de femmes ni sur la qualité relative des coefficients de correlation obtenus,\n", "on retrouve l'observation faite précédemment avec les modèles de regression : à savoir, la corrélation entre les 2 variables [mortalité] et [âge] est plus forte pour le groupe des femmes non fumeuses que pour le groupe des fumeuses. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }