{ "cells": [ { "cell_type": "markdown", "metadata": { "hideCode": true, "hidePrompt": true }, "source": [ "# Sujet 6 : Autour du Paradoxe de Simpson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Contexte :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En 1972-1974, à Whickham, une ville du nord-est de l'Angleterre, située à environ 6,5 kilomètres au sud-ouest de Newcastle upon Tyne, un sondage d'un sixième des électeurs a été effectué afin d'éclairer des travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Une suite de cette étude a été menée vingt ans plus tard (Vanderpump et al. 1995). Certains des résultats avaient trait au tabagisme et cherchaient à savoir si les individus étaient toujours en vie lors de la seconde étude. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé et ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### L'étude de ce sujet se fera en 3 étapes :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Représenter dans un tableau le nombre total de femmes vivantes et décédées sur la période en fonction de leur habitude de tabagisme. Calculer dans chaque groupe (fumeuses / non fumeuses) le taux de mortalité (le rapport entre le nombre de femmes décédées dans un groupe et le nombre total de femmes dans ce groupe). Analyser ce résultat.\n", "\n", "2. Reprendre la question 1 (effectifs et taux de mortalité) en rajoutant une nouvelle catégorie liée à la classe d'âge. On considérera les classes suivantes : 18-34 ans, 35-54 ans, 55-64 ans, plus de 65 ans. Analyser le résultat.\n", "\n", "3. Etablir une régression logistique en introduisant un variable Death valant 1 ou 0 si la personne est morte ou pas au cours des 20 années entre les 2 sondages. Conclure." ] }, { "cell_type": "markdown", "metadata": { "hideCode": true, "hidePrompt": true }, "source": [ "Tout d'abord, il faut commencer par inclure les bibliothèques dont on aura besoin." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il faut ensuite charger et lire le fichier" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_file = \"Subject6_smoking.csv\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
2NoDead57.5
3NoAlive47.1
4YesAlive81.4
5NoAlive36.8
6NoAlive23.8
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
10YesAlive30.0
11NoDead66.0
12YesAlive49.2
13NoAlive58.4
14NoDead60.6
15NoAlive25.1
16NoAlive43.5
17NoAlive27.1
18NoAlive58.3
19YesAlive65.7
20NoDead73.2
21YesAlive38.3
22NoAlive33.4
23YesDead62.3
24NoAlive18.0
25NoAlive56.2
26YesAlive59.2
27NoAlive25.8
28NoDead36.9
29NoAlive20.2
............
1284YesDead36.0
1285YesAlive48.3
1286NoAlive63.1
1287NoAlive60.8
1288YesDead39.3
1289NoAlive36.7
1290NoAlive63.8
1291NoDead71.3
1292NoAlive57.7
1293NoAlive63.2
1294NoAlive46.6
1295YesDead82.4
1296YesAlive38.3
1297YesAlive32.7
1298NoAlive39.7
1299YesDead60.0
1300NoDead71.0
1301NoAlive20.5
1302NoAlive44.4
1303YesAlive31.2
1304YesAlive47.8
1305YesAlive60.9
1306NoDead61.4
1307YesAlive43.0
1308NoAlive42.1
1309YesAlive35.9
1310NoAlive22.3
1311YesDead62.1
1312NoDead88.6
1313NoAlive39.1
\n", "

1314 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "2 No Dead 57.5\n", "3 No Alive 47.1\n", "4 Yes Alive 81.4\n", "5 No Alive 36.8\n", "6 No Alive 23.8\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5\n", "10 Yes Alive 30.0\n", "11 No Dead 66.0\n", "12 Yes Alive 49.2\n", "13 No Alive 58.4\n", "14 No Dead 60.6\n", "15 No Alive 25.1\n", "16 No Alive 43.5\n", "17 No Alive 27.1\n", "18 No Alive 58.3\n", "19 Yes Alive 65.7\n", "20 No Dead 73.2\n", "21 Yes Alive 38.3\n", "22 No Alive 33.4\n", "23 Yes Dead 62.3\n", "24 No Alive 18.0\n", "25 No Alive 56.2\n", "26 Yes Alive 59.2\n", "27 No Alive 25.8\n", "28 No Dead 36.9\n", "29 No Alive 20.2\n", "... ... ... ...\n", "1284 Yes Dead 36.0\n", "1285 Yes Alive 48.3\n", "1286 No Alive 63.1\n", "1287 No Alive 60.8\n", "1288 Yes Dead 39.3\n", "1289 No Alive 36.7\n", "1290 No Alive 63.8\n", "1291 No Dead 71.3\n", "1292 No Alive 57.7\n", "1293 No Alive 63.2\n", "1294 No Alive 46.6\n", "1295 Yes Dead 82.4\n", "1296 Yes Alive 38.3\n", "1297 Yes Alive 32.7\n", "1298 No Alive 39.7\n", "1299 Yes Dead 60.0\n", "1300 No Dead 71.0\n", "1301 No Alive 20.5\n", "1302 No Alive 44.4\n", "1303 Yes Alive 31.2\n", "1304 Yes Alive 47.8\n", "1305 Yes Alive 60.9\n", "1306 No Dead 61.4\n", "1307 Yes Alive 43.0\n", "1308 No Alive 42.1\n", "1309 Yes Alive 35.9\n", "1310 No Alive 22.3\n", "1311 Yes Dead 62.1\n", "1312 No Dead 88.6\n", "1313 No Alive 39.1\n", "\n", "[1314 rows x 3 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Création de 2 tableaux à partir du contenu du fichier csv :\n", " *nonFumeuses* contient les données des personnes qui ne fument pas (qui ont \"No\" dans la colonne \"Smoker\")\n", " et *fumeuses* contient les données des personnes qui fument (qui ont \"Yes\" dans la colonne \"Smoker\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "#trier = raw_data.sort_values(by = [\"Smoker\"])\n", "masq = raw_data[\"Smoker\"] == \"Yes\"\n", "fumeuses = raw_data.loc[masq]\n", "nonFumeuses = trier.loc[raw_data[\"Smoker\"]==\"No\"]\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
915YesAlive47.4
1095YesAlive30.2
941YesAlive30.4
1311YesDead62.1
1091YesAlive60.0
945YesAlive60.2
1148YesAlive50.6
1307YesAlive43.0
913YesDead84.4
946YesAlive25.0
1276YesAlive58.5
950YesDead43.3
1309YesAlive35.9
947YesDead37.1
1125YesAlive57.2
948YesAlive47.7
1093YesDead84.3
911YesAlive38.6
1273YesAlive55.7
1149YesAlive21.5
1111YesAlive41.9
1127YesAlive32.5
1115YesDead63.3
1114YesDead31.3
1143YesAlive26.6
1102YesAlive29.7
1142YesDead71.0
1140YesAlive42.3
1288YesDead39.3
1132YesAlive18.0
............
649YesAlive36.9
650YesDead81.8
611YesAlive43.4
608YesAlive23.5
605YesAlive59.0
604YesAlive43.8
554YesAlive21.3
555YesDead76.9
558YesDead75.2
560YesAlive53.0
562YesAlive43.7
563YesAlive50.9
565YesAlive32.8
566YesAlive50.7
567YesDead66.1
569YesAlive27.2
548YesAlive62.1
571YesAlive38.1
573YesDead55.2
575YesAlive50.9
580YesAlive42.5
583YesAlive26.6
584YesAlive23.3
587YesAlive34.8
589YesAlive28.2
591YesAlive38.5
592YesAlive41.0
595YesAlive25.7
572YesDead66.8
656YesAlive43.0
\n", "

582 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "915 Yes Alive 47.4\n", "1095 Yes Alive 30.2\n", "941 Yes Alive 30.4\n", "1311 Yes Dead 62.1\n", "1091 Yes Alive 60.0\n", "945 Yes Alive 60.2\n", "1148 Yes Alive 50.6\n", "1307 Yes Alive 43.0\n", "913 Yes Dead 84.4\n", "946 Yes Alive 25.0\n", "1276 Yes Alive 58.5\n", "950 Yes Dead 43.3\n", "1309 Yes Alive 35.9\n", "947 Yes Dead 37.1\n", "1125 Yes Alive 57.2\n", "948 Yes Alive 47.7\n", "1093 Yes Dead 84.3\n", "911 Yes Alive 38.6\n", "1273 Yes Alive 55.7\n", "1149 Yes Alive 21.5\n", "1111 Yes Alive 41.9\n", "1127 Yes Alive 32.5\n", "1115 Yes Dead 63.3\n", "1114 Yes Dead 31.3\n", "1143 Yes Alive 26.6\n", "1102 Yes Alive 29.7\n", "1142 Yes Dead 71.0\n", "1140 Yes Alive 42.3\n", "1288 Yes Dead 39.3\n", "1132 Yes Alive 18.0\n", "... ... ... ...\n", "649 Yes Alive 36.9\n", "650 Yes Dead 81.8\n", "611 Yes Alive 43.4\n", "608 Yes Alive 23.5\n", "605 Yes Alive 59.0\n", "604 Yes Alive 43.8\n", "554 Yes Alive 21.3\n", "555 Yes Dead 76.9\n", "558 Yes Dead 75.2\n", "560 Yes Alive 53.0\n", "562 Yes Alive 43.7\n", "563 Yes Alive 50.9\n", "565 Yes Alive 32.8\n", "566 Yes Alive 50.7\n", "567 Yes Dead 66.1\n", "569 Yes Alive 27.2\n", "548 Yes Alive 62.1\n", "571 Yes Alive 38.1\n", "573 Yes Dead 55.2\n", "575 Yes Alive 50.9\n", "580 Yes Alive 42.5\n", "583 Yes Alive 26.6\n", "584 Yes Alive 23.3\n", "587 Yes Alive 34.8\n", "589 Yes Alive 28.2\n", "591 Yes Alive 38.5\n", "592 Yes Alive 41.0\n", "595 Yes Alive 25.7\n", "572 Yes Dead 66.8\n", "656 Yes Alive 43.0\n", "\n", "[582 rows x 3 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fumeuses" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
1313NoAlive39.1
1048NoAlive28.5
568NoAlive33.5
1047NoAlive62.6
570NoDead56.2
1046NoAlive20.3
1045NoAlive48.5
1044NoAlive32.2
574NoAlive51.6
576NoAlive41.4
577NoDead65.4
578NoDead67.7
579NoAlive37.8
1042NoAlive61.5
581NoAlive23.9
582NoAlive60.1
585NoDead75.6
586NoDead72.1
1039NoAlive21.7
588NoDead55.3
1038NoDead81.8
590NoDead79.3
564NoDead29.8
1051NoAlive53.8
1052NoAlive20.7
561NoAlive62.4
529NoAlive25.5
1068NoAlive49.4
533NoAlive35.1
534NoAlive38.0
............
1128NoAlive19.1
396NoAlive20.4
261NoAlive49.1
1190NoAlive38.7
268NoAlive52.4
256NoAlive52.6
398NoAlive46.2
277NoAlive55.3
1183NoAlive57.5
278NoDead87.7
383NoDead74.1
1196NoDead76.2
273NoAlive36.5
252NoAlive20.1
384NoAlive37.0
403NoDead78.0
250NoAlive30.8
249NoDead84.3
404NoAlive26.8
1131NoAlive22.9
1184NoAlive46.5
282NoAlive18.5
1194NoDead83.3
255NoAlive19.6
405NoAlive63.0
276NoAlive38.4
1124NoAlive52.0
275NoAlive38.8
1185NoDead73.8
280NoAlive74.1
\n", "

732 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "1313 No Alive 39.1\n", "1048 No Alive 28.5\n", "568 No Alive 33.5\n", "1047 No Alive 62.6\n", "570 No Dead 56.2\n", "1046 No Alive 20.3\n", "1045 No Alive 48.5\n", "1044 No Alive 32.2\n", "574 No Alive 51.6\n", "576 No Alive 41.4\n", "577 No Dead 65.4\n", "578 No Dead 67.7\n", "579 No Alive 37.8\n", "1042 No Alive 61.5\n", "581 No Alive 23.9\n", "582 No Alive 60.1\n", "585 No Dead 75.6\n", "586 No Dead 72.1\n", "1039 No Alive 21.7\n", "588 No Dead 55.3\n", "1038 No Dead 81.8\n", "590 No Dead 79.3\n", "564 No Dead 29.8\n", "1051 No Alive 53.8\n", "1052 No Alive 20.7\n", "561 No Alive 62.4\n", "529 No Alive 25.5\n", "1068 No Alive 49.4\n", "533 No Alive 35.1\n", "534 No Alive 38.0\n", "... ... ... ...\n", "1128 No Alive 19.1\n", "396 No Alive 20.4\n", "261 No Alive 49.1\n", "1190 No Alive 38.7\n", "268 No Alive 52.4\n", "256 No Alive 52.6\n", "398 No Alive 46.2\n", "277 No Alive 55.3\n", "1183 No Alive 57.5\n", "278 No Dead 87.7\n", "383 No Dead 74.1\n", "1196 No Dead 76.2\n", "273 No Alive 36.5\n", "252 No Alive 20.1\n", "384 No Alive 37.0\n", "403 No Dead 78.0\n", "250 No Alive 30.8\n", "249 No Dead 84.3\n", "404 No Alive 26.8\n", "1131 No Alive 22.9\n", "1184 No Alive 46.5\n", "282 No Alive 18.5\n", "1194 No Dead 83.3\n", "255 No Alive 19.6\n", "405 No Alive 63.0\n", "276 No Alive 38.4\n", "1124 No Alive 52.0\n", "275 No Alive 38.8\n", "1185 No Dead 73.8\n", "280 No Alive 74.1\n", "\n", "[732 rows x 3 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nonFumeuses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du nombre **total** de fumeuses (*nbTotalF*) et de non fumeuses (*nbTotalNF*)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Le nombre total de fumeuses est de : 582\n", "Le nombre total de non fumeuses est de : 732\n" ] } ], "source": [ "nbTotalF = len(fumeuses.axes[0])\n", "nbTotalNF = len(nonFumeuses.axes[0])\n", "print(\"Le nombre total de fumeuses est de :\", nbTotalF)\n", "print(\"Le nombre total de non fumeuses est de :\", nbTotalNF)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du nombre de **fumeuses décédées** (*nbDecedeesF*)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nbDecedeesF = len(fumeuses.loc[fumeuses[\"Status\"]==\"Dead\"])\n", "nbDecedeesF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du nombre de **non fumeuses décédées** (*nbDecedeesNF*)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "230" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nbDecedeesNF = len(nonFumeuses.loc[nonFumeuses[\"Status\"]==\"Dead\"])\n", "nbDecedeesNF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du **taux de mortalité** des fumeuses (*tauxMortF*) et des non fumeuses (*tauxMortNF*)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sur la période donnée, il y a pour les fumeuses un taux de mortalité de : 23.883161512027492 %\n", "et il y a pour les non fumeuses un taux de mortalité de : 31.420765027322407 %\n" ] } ], "source": [ "tauxMortF = nbDecedeesF/nbTotalF\n", "tauxMortNF = nbDecedeesNF/nbTotalNF\n", "print(\"Sur la période donnée, il y a pour les fumeuses un taux de mortalité de : \", tauxMortF*100, \"%\")\n", "print(\"et il y a pour les non fumeuses un taux de mortalité de : \", tauxMortNF*100, \"%\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StatuttauxMortalite
0Fumeuses23.883162
1nonFumeuses31.420765
\n", "
" ], "text/plain": [ " Statut tauxMortalite\n", "0 Fumeuses 23.883162\n", "1 nonFumeuses 31.420765" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {\"tauxMortalite\" : [tauxMortF*100, tauxMortNF*100], \"Statut\" : [\"Fumeuses\", \"nonFumeuses\"]}\n", "dt = pd.DataFrame(data = d)\n", "dt" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "plt.figure(figsize=(8, 5))\n", "plt.bar(dt[\"Statut\"], dt[\"tauxMortalite\"], color=['salmon', 'skyblue'])\n", "\n", "# Titre et labels\n", "plt.title(\"Taux de mortalité par statut de tabagisme\")\n", "plt.xlabel(\"Statut\")\n", "plt.ylabel(\"Taux de mortalité (%)\")\n", "\n", "# Affichage du graphique\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On obtient des résultats assez surprenants dans le sens où étant donné que l'on nous a souvent répété que fumer est mauvais pour la santé, nous nous attendions à retrouver ce fait dans cette étude." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Etape 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_code_all_hidden": true, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 4 }