{ "cells": [ { "cell_type": "markdown", "metadata": { "hideCode": true, "hidePrompt": true }, "source": [ "# Sujet 6 : Autour du Paradoxe de Simpson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Contexte :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En 1972-1974, à Whickham, une ville du nord-est de l'Angleterre, située à environ 6,5 kilomètres au sud-ouest de Newcastle upon Tyne, un sondage d'un sixième des électeurs a été effectué afin d'éclairer des travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Une suite de cette étude a été menée vingt ans plus tard (Vanderpump et al. 1995). Certains des résultats avaient trait au tabagisme et cherchaient à savoir si les individus étaient toujours en vie lors de la seconde étude. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé et ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### L'étude de ce sujet se fera en 3 étapes :" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Représenter dans un tableau le nombre total de femmes vivantes et décédées sur la période en fonction de leur habitude de tabagisme. Calculer dans chaque groupe (fumeuses / non fumeuses) le taux de mortalité (le rapport entre le nombre de femmes décédées dans un groupe et le nombre total de femmes dans ce groupe). Analyser ce résultat.\n", "\n", "2. Reprendre la question 1 (effectifs et taux de mortalité) en rajoutant une nouvelle catégorie liée à la classe d'âge. On considérera les classes suivantes : 18-34 ans, 35-54 ans, 55-64 ans, plus de 65 ans. Analyser le résultat.\n", "\n", "3. Etablir une régression logistique en introduisant un variable Death valant 1 ou 0 si la personne est morte ou pas au cours des 20 années entre les 2 sondages. Conclure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Etape 1 :" ] }, { "cell_type": "markdown", "metadata": { "hideCode": true, "hidePrompt": true }, "source": [ "Tout d'abord, il faut commencer par inclure les bibliothèques dont nous aurons besoin." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Il faut ensuite charger et lire le fichier" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data_file = \"Subject6_smoking.csv\"" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
2NoDead57.5
3NoAlive47.1
4YesAlive81.4
5NoAlive36.8
6NoAlive23.8
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
10YesAlive30.0
11NoDead66.0
12YesAlive49.2
13NoAlive58.4
14NoDead60.6
15NoAlive25.1
16NoAlive43.5
17NoAlive27.1
18NoAlive58.3
19YesAlive65.7
20NoDead73.2
21YesAlive38.3
22NoAlive33.4
23YesDead62.3
24NoAlive18.0
25NoAlive56.2
26YesAlive59.2
27NoAlive25.8
28NoDead36.9
29NoAlive20.2
............
1284YesDead36.0
1285YesAlive48.3
1286NoAlive63.1
1287NoAlive60.8
1288YesDead39.3
1289NoAlive36.7
1290NoAlive63.8
1291NoDead71.3
1292NoAlive57.7
1293NoAlive63.2
1294NoAlive46.6
1295YesDead82.4
1296YesAlive38.3
1297YesAlive32.7
1298NoAlive39.7
1299YesDead60.0
1300NoDead71.0
1301NoAlive20.5
1302NoAlive44.4
1303YesAlive31.2
1304YesAlive47.8
1305YesAlive60.9
1306NoDead61.4
1307YesAlive43.0
1308NoAlive42.1
1309YesAlive35.9
1310NoAlive22.3
1311YesDead62.1
1312NoDead88.6
1313NoAlive39.1
\n", "

1314 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "2 No Dead 57.5\n", "3 No Alive 47.1\n", "4 Yes Alive 81.4\n", "5 No Alive 36.8\n", "6 No Alive 23.8\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5\n", "10 Yes Alive 30.0\n", "11 No Dead 66.0\n", "12 Yes Alive 49.2\n", "13 No Alive 58.4\n", "14 No Dead 60.6\n", "15 No Alive 25.1\n", "16 No Alive 43.5\n", "17 No Alive 27.1\n", "18 No Alive 58.3\n", "19 Yes Alive 65.7\n", "20 No Dead 73.2\n", "21 Yes Alive 38.3\n", "22 No Alive 33.4\n", "23 Yes Dead 62.3\n", "24 No Alive 18.0\n", "25 No Alive 56.2\n", "26 Yes Alive 59.2\n", "27 No Alive 25.8\n", "28 No Dead 36.9\n", "29 No Alive 20.2\n", "... ... ... ...\n", "1284 Yes Dead 36.0\n", "1285 Yes Alive 48.3\n", "1286 No Alive 63.1\n", "1287 No Alive 60.8\n", "1288 Yes Dead 39.3\n", "1289 No Alive 36.7\n", "1290 No Alive 63.8\n", "1291 No Dead 71.3\n", "1292 No Alive 57.7\n", "1293 No Alive 63.2\n", "1294 No Alive 46.6\n", "1295 Yes Dead 82.4\n", "1296 Yes Alive 38.3\n", "1297 Yes Alive 32.7\n", "1298 No Alive 39.7\n", "1299 Yes Dead 60.0\n", "1300 No Dead 71.0\n", "1301 No Alive 20.5\n", "1302 No Alive 44.4\n", "1303 Yes Alive 31.2\n", "1304 Yes Alive 47.8\n", "1305 Yes Alive 60.9\n", "1306 No Dead 61.4\n", "1307 Yes Alive 43.0\n", "1308 No Alive 42.1\n", "1309 Yes Alive 35.9\n", "1310 No Alive 22.3\n", "1311 Yes Dead 62.1\n", "1312 No Dead 88.6\n", "1313 No Alive 39.1\n", "\n", "[1314 rows x 3 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Création de 2 \"tableaux\" à partir du contenu du fichier csv :\n", " *nonFumeuses* contient les données des personnes qui ne fument pas (qui ont \"No\" dans la colonne \"Smoker\")\n", " et *fumeuses* contient les données des personnes qui fument (qui ont \"Yes\" dans la colonne \"Smoker\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "#trier = raw_data.sort_values(by = [\"Smoker\"])\n", "masq = raw_data[\"Smoker\"] == \"Yes\"\n", "fumeuses = raw_data.loc[masq]\n", "nonFumeuses = trier.loc[raw_data[\"Smoker\"]==\"No\"]\n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
4YesAlive81.4
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
10YesAlive30.0
12YesAlive49.2
19YesAlive65.7
21YesAlive38.3
23YesDead62.3
26YesAlive59.2
30YesAlive34.6
31YesAlive51.9
32YesAlive49.9
35YesAlive46.7
36YesAlive44.4
37YesAlive29.5
38YesDead33.0
39YesAlive35.6
40YesAlive39.1
42YesAlive35.7
46YesDead44.3
48YesAlive37.5
49YesAlive22.1
53YesAlive39.0
56YesAlive40.1
60YesAlive58.1
61YesAlive37.3
63YesDead36.3
............
1240YesAlive29.7
1243YesAlive40.1
1251YesAlive27.8
1252YesAlive52.4
1253YesAlive27.8
1254YesAlive41.0
1259YesAlive40.8
1260YesAlive20.4
1263YesAlive20.9
1264YesAlive45.5
1269YesAlive38.8
1270YesAlive55.5
1271YesAlive24.9
1273YesAlive55.7
1276YesAlive58.5
1278YesAlive43.7
1282YesAlive51.2
1284YesDead36.0
1285YesAlive48.3
1288YesDead39.3
1295YesDead82.4
1296YesAlive38.3
1297YesAlive32.7
1299YesDead60.0
1303YesAlive31.2
1304YesAlive47.8
1305YesAlive60.9
1307YesAlive43.0
1309YesAlive35.9
1311YesDead62.1
\n", "

582 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "0 Yes Alive 21.0\n", "1 Yes Alive 19.3\n", "4 Yes Alive 81.4\n", "7 Yes Dead 57.5\n", "8 Yes Alive 24.8\n", "9 Yes Alive 49.5\n", "10 Yes Alive 30.0\n", "12 Yes Alive 49.2\n", "19 Yes Alive 65.7\n", "21 Yes Alive 38.3\n", "23 Yes Dead 62.3\n", "26 Yes Alive 59.2\n", "30 Yes Alive 34.6\n", "31 Yes Alive 51.9\n", "32 Yes Alive 49.9\n", "35 Yes Alive 46.7\n", "36 Yes Alive 44.4\n", "37 Yes Alive 29.5\n", "38 Yes Dead 33.0\n", "39 Yes Alive 35.6\n", "40 Yes Alive 39.1\n", "42 Yes Alive 35.7\n", "46 Yes Dead 44.3\n", "48 Yes Alive 37.5\n", "49 Yes Alive 22.1\n", "53 Yes Alive 39.0\n", "56 Yes Alive 40.1\n", "60 Yes Alive 58.1\n", "61 Yes Alive 37.3\n", "63 Yes Dead 36.3\n", "... ... ... ...\n", "1240 Yes Alive 29.7\n", "1243 Yes Alive 40.1\n", "1251 Yes Alive 27.8\n", "1252 Yes Alive 52.4\n", "1253 Yes Alive 27.8\n", "1254 Yes Alive 41.0\n", "1259 Yes Alive 40.8\n", "1260 Yes Alive 20.4\n", "1263 Yes Alive 20.9\n", "1264 Yes Alive 45.5\n", "1269 Yes Alive 38.8\n", "1270 Yes Alive 55.5\n", "1271 Yes Alive 24.9\n", "1273 Yes Alive 55.7\n", "1276 Yes Alive 58.5\n", "1278 Yes Alive 43.7\n", "1282 Yes Alive 51.2\n", "1284 Yes Dead 36.0\n", "1285 Yes Alive 48.3\n", "1288 Yes Dead 39.3\n", "1295 Yes Dead 82.4\n", "1296 Yes Alive 38.3\n", "1297 Yes Alive 32.7\n", "1299 Yes Dead 60.0\n", "1303 Yes Alive 31.2\n", "1304 Yes Alive 47.8\n", "1305 Yes Alive 60.9\n", "1307 Yes Alive 43.0\n", "1309 Yes Alive 35.9\n", "1311 Yes Dead 62.1\n", "\n", "[582 rows x 3 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Affichage\n", "fumeuses" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SmokerStatusAge
1313NoAlive39.1
1048NoAlive28.5
568NoAlive33.5
1047NoAlive62.6
570NoDead56.2
1046NoAlive20.3
1045NoAlive48.5
1044NoAlive32.2
574NoAlive51.6
576NoAlive41.4
577NoDead65.4
578NoDead67.7
579NoAlive37.8
1042NoAlive61.5
581NoAlive23.9
582NoAlive60.1
585NoDead75.6
586NoDead72.1
1039NoAlive21.7
588NoDead55.3
1038NoDead81.8
590NoDead79.3
564NoDead29.8
1051NoAlive53.8
1052NoAlive20.7
561NoAlive62.4
529NoAlive25.5
1068NoAlive49.4
533NoAlive35.1
534NoAlive38.0
............
1128NoAlive19.1
396NoAlive20.4
261NoAlive49.1
1190NoAlive38.7
268NoAlive52.4
256NoAlive52.6
398NoAlive46.2
277NoAlive55.3
1183NoAlive57.5
278NoDead87.7
383NoDead74.1
1196NoDead76.2
273NoAlive36.5
252NoAlive20.1
384NoAlive37.0
403NoDead78.0
250NoAlive30.8
249NoDead84.3
404NoAlive26.8
1131NoAlive22.9
1184NoAlive46.5
282NoAlive18.5
1194NoDead83.3
255NoAlive19.6
405NoAlive63.0
276NoAlive38.4
1124NoAlive52.0
275NoAlive38.8
1185NoDead73.8
280NoAlive74.1
\n", "

732 rows × 3 columns

\n", "
" ], "text/plain": [ " Smoker Status Age\n", "1313 No Alive 39.1\n", "1048 No Alive 28.5\n", "568 No Alive 33.5\n", "1047 No Alive 62.6\n", "570 No Dead 56.2\n", "1046 No Alive 20.3\n", "1045 No Alive 48.5\n", "1044 No Alive 32.2\n", "574 No Alive 51.6\n", "576 No Alive 41.4\n", "577 No Dead 65.4\n", "578 No Dead 67.7\n", "579 No Alive 37.8\n", "1042 No Alive 61.5\n", "581 No Alive 23.9\n", "582 No Alive 60.1\n", "585 No Dead 75.6\n", "586 No Dead 72.1\n", "1039 No Alive 21.7\n", "588 No Dead 55.3\n", "1038 No Dead 81.8\n", "590 No Dead 79.3\n", "564 No Dead 29.8\n", "1051 No Alive 53.8\n", "1052 No Alive 20.7\n", "561 No Alive 62.4\n", "529 No Alive 25.5\n", "1068 No Alive 49.4\n", "533 No Alive 35.1\n", "534 No Alive 38.0\n", "... ... ... ...\n", "1128 No Alive 19.1\n", "396 No Alive 20.4\n", "261 No Alive 49.1\n", "1190 No Alive 38.7\n", "268 No Alive 52.4\n", "256 No Alive 52.6\n", "398 No Alive 46.2\n", "277 No Alive 55.3\n", "1183 No Alive 57.5\n", "278 No Dead 87.7\n", "383 No Dead 74.1\n", "1196 No Dead 76.2\n", "273 No Alive 36.5\n", "252 No Alive 20.1\n", "384 No Alive 37.0\n", "403 No Dead 78.0\n", "250 No Alive 30.8\n", "249 No Dead 84.3\n", "404 No Alive 26.8\n", "1131 No Alive 22.9\n", "1184 No Alive 46.5\n", "282 No Alive 18.5\n", "1194 No Dead 83.3\n", "255 No Alive 19.6\n", "405 No Alive 63.0\n", "276 No Alive 38.4\n", "1124 No Alive 52.0\n", "275 No Alive 38.8\n", "1185 No Dead 73.8\n", "280 No Alive 74.1\n", "\n", "[732 rows x 3 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Affichage\n", "nonFumeuses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du nombre **total** de fumeuses (*nbTotalF*) et de non fumeuses (*nbTotalNF*)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Le nombre total de fumeuses est de : 582\n", "Le nombre total de non fumeuses est de : 732\n" ] } ], "source": [ "nbTotalF = len(fumeuses.axes[0])\n", "nbTotalNF = len(nonFumeuses.axes[0])\n", "print(\"Le nombre total de fumeuses est de :\", nbTotalF)\n", "print(\"Le nombre total de non fumeuses est de :\", nbTotalNF)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du nombre de **fumeuses décédées** (*nbDecedeesF*)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "139" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nbDecedeesF = len(fumeuses.loc[fumeuses[\"Status\"]==\"Dead\"])\n", "nbDecedeesF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du nombre de **non fumeuses décédées** (*nbDecedeesNF*)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "230" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nbDecedeesNF = len(nonFumeuses.loc[nonFumeuses[\"Status\"]==\"Dead\"])\n", "nbDecedeesNF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calcul du **taux de mortalité** des fumeuses (*tauxMortF*) et des non fumeuses (*tauxMortNF*)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sur la période donnée, il y a pour les fumeuses un taux de mortalité de : 23.883161512027492 %\n", "et il y a pour les non fumeuses un taux de mortalité de : 31.420765027322407 %\n" ] } ], "source": [ "tauxMortF = nbDecedeesF/nbTotalF\n", "tauxMortNF = nbDecedeesNF/nbTotalNF\n", "print(\"Sur la période donnée, il y a pour les fumeuses un taux de mortalité de : \", tauxMortF*100, \"%\")\n", "print(\"et il y a pour les non fumeuses un taux de mortalité de : \", tauxMortNF*100, \"%\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Création d'une nouvelle DataFrame pandas (*dt*) qui contient les taux de mortalité selon le statut (fumeuse ou non) en vue de la construction d'un graphique utilisant ces données." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StatuttauxMortalite
0Fumeuses23.883162
1nonFumeuses31.420765
\n", "
" ], "text/plain": [ " Statut tauxMortalite\n", "0 Fumeuses 23.883162\n", "1 nonFumeuses 31.420765" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {\"tauxMortalite\" : [tauxMortF*100, tauxMortNF*100], \"Statut\" : [\"Fumeuses\", \"nonFumeuses\"]}\n", "dt = pd.DataFrame(data = d)\n", "dt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Création d'un diagramme en barre pour illustrer les calculs précédents." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "plt.figure(figsize=(8, 5))\n", "plt.bar(dt[\"Statut\"], dt[\"tauxMortalite\"], color=['salmon', 'skyblue'])\n", "\n", "plt.title(\"Taux de mortalité par statut de tabagisme\")\n", "plt.xlabel(\"Statut\")\n", "plt.ylabel(\"Taux de mortalité (%)\")\n", "\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On obtient des résultats assez surprenants dans le sens où, étant donné que l'on nous a souvent répété que fumer est mauvais pour la santé, nous nous attendions à retrouver ce fait dans cette étude.\n", "Or, nous pouvons observer que le résultat des calculs effectués nous montre l'inverse de ce à quoi nous nous attendions : le groupe de femmes qui ne fumaient pas a un taux de mortalité supérieur à celui composé de femmes qui fumaient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Etape 2" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "nb18_34F = len(fumeuses.loc[fumeuses[\"Age\"]<=34]) - len(fumeuses.loc[fumeuses[\"Age\"]<18])\n", "nb18_34NF = len(nonFumeuses.loc[nonFumeuses[\"Age\"]<=34]) - len(nonFumeuses.loc[nonFumeuses[\"Age\"]<18])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 fumeuses ayant entre 18 et 34 ans lors du premier sondage sont décédées durant la période de 20 ans\n" ] } ], "source": [ "test = fumeuses.loc[fumeuses[\"Age\"]<=34]\n", "t2 = test.loc[test[\"Age\"]>18]\n", "\n", "nbDecedees18_34F = len(t2.loc[t2[\"Status\"]==\"Dead\"])\n", "print(nbDecedees18_34F, \"fumeuses ayant entre 18 et 34 ans lors du premier sondage sont décédées durant la période de 20 ans\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_code_all_hidden": true, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 4 }