{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sujet 6 : Autour du Paradoxe de Simpson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En 1972-1974, à Whickham, une ville du nord-est de l'Angleterre, située à environ 6,5 kilomètres au sud-ouest de Newcastle upon Tyne, un sondage d'un sixième des électeurs a été effectué afin d'éclairer des travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Une suite de cette étude a été menée vingt ans plus tard (Vanderpump et al. 1995). Certains des résultats avaient trait au tabagisme et cherchaient à savoir si les individus étaient toujours en vie lors de la seconde étude. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé et ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le fichier récupéré est en format CSV. Sur chaque ligne se trouve les informations suivantes: la personne fume ou non, elle est vivante au moment de la seconde étude, et son âge lors du premier sondage." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "data_url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv?inline=false\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La première ligne du fichier CSV est un commentaire, que nous ignorons en précisant skiprows=1." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Smoker | \n", "Status | \n", "Age | \n", "
|---|---|---|---|
| 0 | \n", "Yes | \n", "Alive | \n", "21.0 | \n", "
| 1 | \n", "Yes | \n", "Alive | \n", "19.3 | \n", "
| 2 | \n", "No | \n", "Dead | \n", "57.5 | \n", "
| 3 | \n", "No | \n", "Alive | \n", "47.1 | \n", "
| 4 | \n", "Yes | \n", "Alive | \n", "81.4 | \n", "
| 5 | \n", "No | \n", "Alive | \n", "36.8 | \n", "
| 6 | \n", "No | \n", "Alive | \n", "23.8 | \n", "
| 7 | \n", "Yes | \n", "Dead | \n", "57.5 | \n", "
| 8 | \n", "Yes | \n", "Alive | \n", "24.8 | \n", "
| 9 | \n", "Yes | \n", "Alive | \n", "49.5 | \n", "
| 10 | \n", "Yes | \n", "Alive | \n", "30.0 | \n", "
| 11 | \n", "No | \n", "Dead | \n", "66.0 | \n", "
| 12 | \n", "Yes | \n", "Alive | \n", "49.2 | \n", "
| 13 | \n", "No | \n", "Alive | \n", "58.4 | \n", "
| 14 | \n", "No | \n", "Dead | \n", "60.6 | \n", "
| 15 | \n", "No | \n", "Alive | \n", "25.1 | \n", "
| 16 | \n", "No | \n", "Alive | \n", "43.5 | \n", "
| 17 | \n", "No | \n", "Alive | \n", "27.1 | \n", "
| 18 | \n", "No | \n", "Alive | \n", "58.3 | \n", "
| 19 | \n", "Yes | \n", "Alive | \n", "65.7 | \n", "
| 20 | \n", "No | \n", "Dead | \n", "73.2 | \n", "
| 21 | \n", "Yes | \n", "Alive | \n", "38.3 | \n", "
| 22 | \n", "No | \n", "Alive | \n", "33.4 | \n", "
| 23 | \n", "Yes | \n", "Dead | \n", "62.3 | \n", "
| 24 | \n", "No | \n", "Alive | \n", "18.0 | \n", "
| 25 | \n", "No | \n", "Alive | \n", "56.2 | \n", "
| 26 | \n", "Yes | \n", "Alive | \n", "59.2 | \n", "
| 27 | \n", "No | \n", "Alive | \n", "25.8 | \n", "
| 28 | \n", "No | \n", "Dead | \n", "36.9 | \n", "
| 29 | \n", "No | \n", "Alive | \n", "20.2 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 1284 | \n", "Yes | \n", "Dead | \n", "36.0 | \n", "
| 1285 | \n", "Yes | \n", "Alive | \n", "48.3 | \n", "
| 1286 | \n", "No | \n", "Alive | \n", "63.1 | \n", "
| 1287 | \n", "No | \n", "Alive | \n", "60.8 | \n", "
| 1288 | \n", "Yes | \n", "Dead | \n", "39.3 | \n", "
| 1289 | \n", "No | \n", "Alive | \n", "36.7 | \n", "
| 1290 | \n", "No | \n", "Alive | \n", "63.8 | \n", "
| 1291 | \n", "No | \n", "Dead | \n", "71.3 | \n", "
| 1292 | \n", "No | \n", "Alive | \n", "57.7 | \n", "
| 1293 | \n", "No | \n", "Alive | \n", "63.2 | \n", "
| 1294 | \n", "No | \n", "Alive | \n", "46.6 | \n", "
| 1295 | \n", "Yes | \n", "Dead | \n", "82.4 | \n", "
| 1296 | \n", "Yes | \n", "Alive | \n", "38.3 | \n", "
| 1297 | \n", "Yes | \n", "Alive | \n", "32.7 | \n", "
| 1298 | \n", "No | \n", "Alive | \n", "39.7 | \n", "
| 1299 | \n", "Yes | \n", "Dead | \n", "60.0 | \n", "
| 1300 | \n", "No | \n", "Dead | \n", "71.0 | \n", "
| 1301 | \n", "No | \n", "Alive | \n", "20.5 | \n", "
| 1302 | \n", "No | \n", "Alive | \n", "44.4 | \n", "
| 1303 | \n", "Yes | \n", "Alive | \n", "31.2 | \n", "
| 1304 | \n", "Yes | \n", "Alive | \n", "47.8 | \n", "
| 1305 | \n", "Yes | \n", "Alive | \n", "60.9 | \n", "
| 1306 | \n", "No | \n", "Dead | \n", "61.4 | \n", "
| 1307 | \n", "Yes | \n", "Alive | \n", "43.0 | \n", "
| 1308 | \n", "No | \n", "Alive | \n", "42.1 | \n", "
| 1309 | \n", "Yes | \n", "Alive | \n", "35.9 | \n", "
| 1310 | \n", "No | \n", "Alive | \n", "22.3 | \n", "
| 1311 | \n", "Yes | \n", "Dead | \n", "62.1 | \n", "
| 1312 | \n", "No | \n", "Dead | \n", "88.6 | \n", "
| 1313 | \n", "No | \n", "Alive | \n", "39.1 | \n", "
1314 rows × 3 columns
\n", "