diff --git a/module3/exo3/exercice.ipynb b/module3/exo3/exercice.ipynb index 0bbbe371b01e359e381e43239412d77bf53fb1fb..c634483ebbbeaf6da436a43ccc7bf87cca2c0305 100644 --- a/module3/exo3/exercice.ipynb +++ b/module3/exo3/exercice.ipynb @@ -1,5 +1,654 @@ { - "cells": [], + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercice sur le paradoxe de Simpson" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Contexte" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "En 1972-1974, à Whickham, une ville du nord-est de l'Angleterre, située à environ 6,5 kilomètres au sud-ouest de Newcastle upon Tyne, un sondage d'un sixième des électeurs a été effectué afin d'éclairer des travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Une suite de cette étude a été menée vingt ans plus tard (Vanderpump et al. 1995). Certains des résultats avaient trait au tabagisme et cherchaient à savoir si les individus étaient toujours en vie lors de la seconde étude. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé et ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Importation des libraries" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Importation du jeu de données" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "data_url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | Smoker | \n", + "Status | \n", + "Age | \n", + "
---|---|---|---|
0 | \n", + "Yes | \n", + "Alive | \n", + "21.0 | \n", + "
1 | \n", + "Yes | \n", + "Alive | \n", + "19.3 | \n", + "
2 | \n", + "No | \n", + "Dead | \n", + "57.5 | \n", + "
3 | \n", + "No | \n", + "Alive | \n", + "47.1 | \n", + "
4 | \n", + "Yes | \n", + "Alive | \n", + "81.4 | \n", + "
5 | \n", + "No | \n", + "Alive | \n", + "36.8 | \n", + "
6 | \n", + "No | \n", + "Alive | \n", + "23.8 | \n", + "
7 | \n", + "Yes | \n", + "Dead | \n", + "57.5 | \n", + "
8 | \n", + "Yes | \n", + "Alive | \n", + "24.8 | \n", + "
9 | \n", + "Yes | \n", + "Alive | \n", + "49.5 | \n", + "
10 | \n", + "Yes | \n", + "Alive | \n", + "30.0 | \n", + "
11 | \n", + "No | \n", + "Dead | \n", + "66.0 | \n", + "
12 | \n", + "Yes | \n", + "Alive | \n", + "49.2 | \n", + "
13 | \n", + "No | \n", + "Alive | \n", + "58.4 | \n", + "
14 | \n", + "No | \n", + "Dead | \n", + "60.6 | \n", + "
15 | \n", + "No | \n", + "Alive | \n", + "25.1 | \n", + "
16 | \n", + "No | \n", + "Alive | \n", + "43.5 | \n", + "
17 | \n", + "No | \n", + "Alive | \n", + "27.1 | \n", + "
18 | \n", + "No | \n", + "Alive | \n", + "58.3 | \n", + "
19 | \n", + "Yes | \n", + "Alive | \n", + "65.7 | \n", + "
20 | \n", + "No | \n", + "Dead | \n", + "73.2 | \n", + "
21 | \n", + "Yes | \n", + "Alive | \n", + "38.3 | \n", + "
22 | \n", + "No | \n", + "Alive | \n", + "33.4 | \n", + "
23 | \n", + "Yes | \n", + "Dead | \n", + "62.3 | \n", + "
24 | \n", + "No | \n", + "Alive | \n", + "18.0 | \n", + "
25 | \n", + "No | \n", + "Alive | \n", + "56.2 | \n", + "
26 | \n", + "Yes | \n", + "Alive | \n", + "59.2 | \n", + "
27 | \n", + "No | \n", + "Alive | \n", + "25.8 | \n", + "
28 | \n", + "No | \n", + "Dead | \n", + "36.9 | \n", + "
29 | \n", + "No | \n", + "Alive | \n", + "20.2 | \n", + "
... | \n", + "... | \n", + "... | \n", + "... | \n", + "
1284 | \n", + "Yes | \n", + "Dead | \n", + "36.0 | \n", + "
1285 | \n", + "Yes | \n", + "Alive | \n", + "48.3 | \n", + "
1286 | \n", + "No | \n", + "Alive | \n", + "63.1 | \n", + "
1287 | \n", + "No | \n", + "Alive | \n", + "60.8 | \n", + "
1288 | \n", + "Yes | \n", + "Dead | \n", + "39.3 | \n", + "
1289 | \n", + "No | \n", + "Alive | \n", + "36.7 | \n", + "
1290 | \n", + "No | \n", + "Alive | \n", + "63.8 | \n", + "
1291 | \n", + "No | \n", + "Dead | \n", + "71.3 | \n", + "
1292 | \n", + "No | \n", + "Alive | \n", + "57.7 | \n", + "
1293 | \n", + "No | \n", + "Alive | \n", + "63.2 | \n", + "
1294 | \n", + "No | \n", + "Alive | \n", + "46.6 | \n", + "
1295 | \n", + "Yes | \n", + "Dead | \n", + "82.4 | \n", + "
1296 | \n", + "Yes | \n", + "Alive | \n", + "38.3 | \n", + "
1297 | \n", + "Yes | \n", + "Alive | \n", + "32.7 | \n", + "
1298 | \n", + "No | \n", + "Alive | \n", + "39.7 | \n", + "
1299 | \n", + "Yes | \n", + "Dead | \n", + "60.0 | \n", + "
1300 | \n", + "No | \n", + "Dead | \n", + "71.0 | \n", + "
1301 | \n", + "No | \n", + "Alive | \n", + "20.5 | \n", + "
1302 | \n", + "No | \n", + "Alive | \n", + "44.4 | \n", + "
1303 | \n", + "Yes | \n", + "Alive | \n", + "31.2 | \n", + "
1304 | \n", + "Yes | \n", + "Alive | \n", + "47.8 | \n", + "
1305 | \n", + "Yes | \n", + "Alive | \n", + "60.9 | \n", + "
1306 | \n", + "No | \n", + "Dead | \n", + "61.4 | \n", + "
1307 | \n", + "Yes | \n", + "Alive | \n", + "43.0 | \n", + "
1308 | \n", + "No | \n", + "Alive | \n", + "42.1 | \n", + "
1309 | \n", + "Yes | \n", + "Alive | \n", + "35.9 | \n", + "
1310 | \n", + "No | \n", + "Alive | \n", + "22.3 | \n", + "
1311 | \n", + "Yes | \n", + "Dead | \n", + "62.1 | \n", + "
1312 | \n", + "No | \n", + "Dead | \n", + "88.6 | \n", + "
1313 | \n", + "No | \n", + "Alive | \n", + "39.1 | \n", + "
1314 rows × 3 columns
\n", + "\n", + " | Smoker | \n", + "Status | \n", + "Age | \n", + "
---|