From 2f88cb48b9ddcb1411f092842cad02aeb8cb48bc Mon Sep 17 00:00:00 2001 From: 2dc3b8211fc4587130f57bad19caf5a3 <2dc3b8211fc4587130f57bad19caf5a3@app-learninglab.inria.fr> Date: Mon, 4 Dec 2023 15:23:01 +0000 Subject: [PATCH] no commit message --- ... \303\251valu\303\251 par les pairs.ipynb" | 22 +- module3/exo3/exercice.ipynb | 841 +++++++++++++++++- 2 files changed, 842 insertions(+), 21 deletions(-) diff --git "a/SUJET 6, Exercice \303\251valu\303\251 par les pairs.ipynb" "b/SUJET 6, Exercice \303\251valu\303\251 par les pairs.ipynb" index ecdc0e2..6eb4a3e 100644 --- "a/SUJET 6, Exercice \303\251valu\303\251 par les pairs.ipynb" +++ "b/SUJET 6, Exercice \303\251valu\303\251 par les pairs.ipynb" @@ -826,25 +826,11 @@ ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], - "source": [] + "source": [ + "Je n'ai pas réussi à comprendre ce qu'est la regression logistique et à l'appliquer." + ] } ], "metadata": { diff --git a/module3/exo3/exercice.ipynb b/module3/exo3/exercice.ipynb index 0bbbe37..6915811 100644 --- a/module3/exo3/exercice.ipynb +++ b/module3/exo3/exercice.ipynb @@ -1,5 +1,841 @@ { - "cells": [], + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Autour du Paradoxe de Simpson" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Enoncé" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "En 1972-1974, à Whickham, une ville du nord-est de l'Angleterre, située à environ 6,5 kilomètres au sud-ouest de Newcastle upon Tyne, un sondage d'un sixième des électeurs a été effectué afin d'éclairer des travaux sur les maladies thyroïdiennes et cardiaques (Tunbridge et al. 1977). Une suite de cette étude a été menée vingt ans plus tard (Vanderpump et al. 1995). Certains des résultats avaient trait au tabagisme et cherchaient à savoir si les individus étaient toujours en vie lors de la seconde étude. Par simplicité, nous nous restreindrons aux femmes et parmi celles-ci aux 1314 qui ont été catégorisées comme \"fumant actuellement\" ou \"n'ayant jamais fumé\". Il y avait relativement peu de femmes dans le sondage initial ayant fumé et ayant arrêté depuis (162) et très peu pour lesquelles l'information n'était pas disponible (18). La survie à 20 ans a été déterminée pour l'ensemble des femmes du premier sondage." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1.Représentez dans un tableau le nombre total de femmes vivantes et décédées sur la période en fonction de leur habitude de tabagisme. Calculez dans chaque groupe (fumeuses / non fumeuses) le taux de mortalité (le rapport entre le nombre de femmes décédées dans un groupe et le nombre total de femmes dans ce groupe). Vous pourrez proposer une représentation graphique de ces données et calculer des intervalles de confiance si vous le souhaitez. En quoi ce résultat est-il surprenant ?" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SmokerStatusAge
0YesAlive21.0
1YesAlive19.3
2NoDead57.5
3NoAlive47.1
4YesAlive81.4
5NoAlive36.8
6NoAlive23.8
7YesDead57.5
8YesAlive24.8
9YesAlive49.5
10YesAlive30.0
11NoDead66.0
12YesAlive49.2
13NoAlive58.4
14NoDead60.6
15NoAlive25.1
16NoAlive43.5
17NoAlive27.1
18NoAlive58.3
19YesAlive65.7
20NoDead73.2
21YesAlive38.3
22NoAlive33.4
23YesDead62.3
24NoAlive18.0
25NoAlive56.2
26YesAlive59.2
27NoAlive25.8
28NoDead36.9
29NoAlive20.2
............
1284YesDead36.0
1285YesAlive48.3
1286NoAlive63.1
1287NoAlive60.8
1288YesDead39.3
1289NoAlive36.7
1290NoAlive63.8
1291NoDead71.3
1292NoAlive57.7
1293NoAlive63.2
1294NoAlive46.6
1295YesDead82.4
1296YesAlive38.3
1297YesAlive32.7
1298NoAlive39.7
1299YesDead60.0
1300NoDead71.0
1301NoAlive20.5
1302NoAlive44.4
1303YesAlive31.2
1304YesAlive47.8
1305YesAlive60.9
1306NoDead61.4
1307YesAlive43.0
1308NoAlive42.1
1309YesAlive35.9
1310NoAlive22.3
1311YesDead62.1
1312NoDead88.6
1313NoAlive39.1
\n", + "

1314 rows × 3 columns

\n", + "
" + ], + "text/plain": [ + " Smoker Status Age\n", + "0 Yes Alive 21.0\n", + "1 Yes Alive 19.3\n", + "2 No Dead 57.5\n", + "3 No Alive 47.1\n", + "4 Yes Alive 81.4\n", + "5 No Alive 36.8\n", + "6 No Alive 23.8\n", + "7 Yes Dead 57.5\n", + "8 Yes Alive 24.8\n", + "9 Yes Alive 49.5\n", + "10 Yes Alive 30.0\n", + "11 No Dead 66.0\n", + "12 Yes Alive 49.2\n", + "13 No Alive 58.4\n", + "14 No Dead 60.6\n", + "15 No Alive 25.1\n", + "16 No Alive 43.5\n", + "17 No Alive 27.1\n", + "18 No Alive 58.3\n", + "19 Yes Alive 65.7\n", + "20 No Dead 73.2\n", + "21 Yes Alive 38.3\n", + "22 No Alive 33.4\n", + "23 Yes Dead 62.3\n", + "24 No Alive 18.0\n", + "25 No Alive 56.2\n", + "26 Yes Alive 59.2\n", + "27 No Alive 25.8\n", + "28 No Dead 36.9\n", + "29 No Alive 20.2\n", + "... ... ... ...\n", + "1284 Yes Dead 36.0\n", + "1285 Yes Alive 48.3\n", + "1286 No Alive 63.1\n", + "1287 No Alive 60.8\n", + "1288 Yes Dead 39.3\n", + "1289 No Alive 36.7\n", + "1290 No Alive 63.8\n", + "1291 No Dead 71.3\n", + "1292 No Alive 57.7\n", + "1293 No Alive 63.2\n", + "1294 No Alive 46.6\n", + "1295 Yes Dead 82.4\n", + "1296 Yes Alive 38.3\n", + "1297 Yes Alive 32.7\n", + "1298 No Alive 39.7\n", + "1299 Yes Dead 60.0\n", + "1300 No Dead 71.0\n", + "1301 No Alive 20.5\n", + "1302 No Alive 44.4\n", + "1303 Yes Alive 31.2\n", + "1304 Yes Alive 47.8\n", + "1305 Yes Alive 60.9\n", + "1306 No Dead 61.4\n", + "1307 Yes Alive 43.0\n", + "1308 No Alive 42.1\n", + "1309 Yes Alive 35.9\n", + "1310 No Alive 22.3\n", + "1311 Yes Dead 62.1\n", + "1312 No Dead 88.6\n", + "1313 No Alive 39.1\n", + "\n", + "[1314 rows x 3 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "url=\"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv?inline=false\"\n", + "doc = pd.read_csv(url, encoding = 'iso-8859-1')\n", + "doc" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "On vérifie l'existence de données nulles." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SmokerStatusAge
\n", + "
" + ], + "text/plain": [ + "Empty DataFrame\n", + "Columns: [Smoker, Status, Age]\n", + "Index: []" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "doc[doc.isnull().any(axis=1)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Comme il n'y en a pas, on peut passer à l'analyse." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Représentation dans un tableau, en fonction de leur habitude de tabagisme, le nombre de femmes vivantes et décédées." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[230, 502, 139, 443]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df=doc.copy()\n", + "Data=[0,0,0,0]\n", + "def compte_stats(tab, ligne):\n", + " index=0;\n", + " #On regarde si la personne fume.\n", + " if ligne.Smoker!=\"No\":\n", + " index=2\n", + " #On regarde si la personne est vivante\n", + " if ligne.Status!=\"Dead\":\n", + " index+=1\n", + " #tab = [non-fumeur mort, non-fumeur vivant, fumeur mort, fumeur vivant]\n", + " tab[index]+=1\n", + "for ligne in df.itertuples():\n", + " compte_stats(Data, ligne)\n", + "Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "On a donc dans le tableau intitulé Data respectivement, le nombre de Non-fumeurs décédés puis vivants puis le nombre de non-fumeurs décédés puis vivants. On peut maintenant passer à l'analyse." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calculez dans chaque groupe (fumeuses / non fumeuses) le taux de mortalité (le rapport entre le nombre de femmes décédées dans un groupe et le nombre total de femmes dans ce groupe). Vous pourrez proposer une représentation graphique de ces données et calculer des intervalles de confiance si vous le souhaitez. En quoi ce résultat est-il surprenant ?" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def diviser_data_fumeur(Data):\n", + " return {\"Alive\":Data[1],\"Dead\":Data[0]},{\"Alive\":Data[3],\"Dead\":Data[2]}\n", + "Non_fumeur,Fumeur = diviser_data_fumeur(Data)\n", + "def taux_mortalite(data):\n", + " return data[\"Dead\"]/(data[\"Dead\"]+data[\"Alive\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAMIAAADFCAYAAAAG5C2JAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAACvFJREFUeJzt3X+MHGUdx/H3h9YqVkTjnUgQOCIQPGPblAM0rUBFGg+M5ZfRSkAQrBoUf4REEg1B0UgVohIUvBAkGBE08UwpWIooUSg1vULTlgqGlCOQIm2BiFUsLf36x8zaYd29nb2b2dmFzytpbnfmeZ59Znufm7nbme8oIjB7rdun6gmYdQMHwQwHwQxwEMwAB8EMcBDMAAfBDHAQzAAHwQyA6VVPoJG+vr4YGBioehr2KrB27drtEdHfql1XBmFgYICxsbGqp2GvApKeyNPOh0ZmOAhmgINgBjgIZoCDYAZ06V+Nmhm49I6qp9BR41eeWvUUXjO8RzDDQTADHAQzIGcQJH1Y0qOSHpN0aYP1iyStl7RO0pik+Xn7mnWDlkGQNA34MTAMDAKLJQ3WNbsHmB0Rc4BPAze00descnn2CMcCj0XE5oh4CbgVWJRtEBE7Ym9dmJlA5O1r1g3yBOEg4MnM86fSZa8g6XRJjwB3kOwVcvc1q1qeIKjBsv+rChYRoxFxFHAacEU7fQEkLUl/vxjbtm1bjmmZFSdPEJ4CDs48fyewpVnjiPgT8C5Jfe30jYiRiBiKiKH+/panj5sVKk8Q1gBHSDpM0gzgE8CybANJh0tS+nguMAN4Nk9fs27Q8hSLiNgt6QvAXcA04MaIeFjS59L11wNnAudK2gW8CHw8/eW5Yd+StsVs0nKdaxQRdwJ31i27PvN4KbA0b1+zbuNPls1wEMwAB8EMcBDMAAfBDHAQzAAHwQxwEMwAB8EMcBDMAAfBDHAQzAAHwQxwEMwAB8EMcBDMgOIKfJ2dFvhaL2mVpNmZdeOSNtSKfxU5ebOitLxCLVOk62SSi/HXSFoWEZsyzR4HToiI5yUNAyPAcZn1CyJie4HzNitUUQW+VkXE8+nT1STVKsx6RmEFvjIuAH6XeR7ASklrJS1p1sl1jaxKeS7eb6dI1wKSIMzPLJ4XEVskvR24W9Ijae2jVw4YMUJySMXQ0FDD8c3KUliBL0mzSIr/LoqIZ2vLI2JL+nUrMEpyqGXWVYoq8HUI8BvgnIj4W2b5TEn71R4DC4GNRU3erChFFfi6DHgb8JO04N3uiBgCDgBG02XTgVsiYkUpW2I2BUUV+LoQuLBBv83A7PrlZt2mp+6qafn5DqTt8SkWZjgIZoCDYAY4CGaAg2AGOAhmgINgBjgIZoCDYAY4CGaAg2AGOAhmgINgBjgIZoCDYAZ0psDXhH3NukHLIGQKfA0Dg8BiSYN1zWoFvmYBV5BWo8jZ16xyZRf4atnXrBuUXeArd18X+LIq5QnCZAp8fa3dvhExEhFDETHU39+fY1pmxclz8X67Bb6GMwW+cvU1q1qpBb7y9DXrBqUW+GrWt6RtMZu0Ugt8Netr1m38ybIZDoIZ4CCYAQ6CGeAgmAEOghngIJgBDoIZ4CCYAQ6CGeAgmAEOghngIJgBDoIZ4CCYAQ6CGVBcga+jJD0gaaekS+rWjUvaIGmdpLGiJm5WpJZXqGWKdJ1McjH+GknLImJTptlzwMXAaU2GWRAR26c6WbOyFFXga2tErAF2lTBHs9KVUeCrXgArJa2VtKRZIxf4sioVWuCriXkRMZek/ulFko5v1MgFvqxKeYIwpSJdEbEl/boVGCU51DLrKoUU+GpG0kxJ+9UeAwuBjZOdrFlZCinwJekdwBjwZmCPpC+TlIHvA0bTol/TgVsiYkU5m2I2eUUV+Po7e0vBZ70AzG6w3Kyr+JNlMxwEM8BBMAMcBDPAQTADHAQzwEEwAxwEM8BBMAMcBDPAQTADHAQzwEEwAxwEM8BBMAM6U9dowr5m3aBlEDJ1jYZJrjpbLGmwrlmtrtFVk+hrVrmy6xq17GvWDcqua5S7r+saWZXKrmuUu6/rGlmVyq5rNKWaSGadUmpdoyn2NeuYUusaRcQLjfqWtTFmk1V2XaOGfc26jT9ZNsNBMAMcBDPAQTADHAQzwEEwAxwEM8BBMAMcBDPAQTADHAQzwEEwAxwEM8BBMAMcBDPAQTADiivwJUnXpOvXS5qbWTcuaYOkdZLGipy8WVFaXqGWKdJ1MsnF+GskLYuITZlmw8AR6b/jgOvSrzULImJ7YbM2K1ghBb7S5zdHYjXwFkkHFjxXs9IUVeBrojYBrJS0VtKSZi/iAl9WpaIKfE3UZl5EzCU5fLpI0vGNXsQFvqxKRRX4atomImpftwKjJIdaZl2lqAJfy4Bz078evQ/4R0Q8LWmmpP0AJM0EFgIbC5y/WSEKKfBFUrfoFOAx4N/A+Wn3A4BRSbXXuiUiVhS+FWZTVFSBrwAuatBvMzB7inM0K50/WTbDQTADHAQzwEEwAxwEM8BBMAMcBDPAQTADHAQzwEEwAxwEM8BBMAMcBDPAQTADHAQzwEEwAzpT4GvCvmbdoGUQMgW+hoFBYLGkwbpm2QJfS0gKfOXta1a5sgt85elrVrk81yw3Kt51XI42B+XsCyQFvkj2JgA7JD2aY26d0gd0vGSllnb6FQvRbe/VoXn65wnCVAp85embLIwYAUZyzKfjJI1FxFDV8+gFvfpe5QnCVAp8zcjR16xypRb4ytnXrHKlFvhq1reULSlXVx6ydamefK+U1OYye23zJ8tmOAhmQA8GQVJIujrz/BJJlxc0dr+kv0h6SNIHihizV0h6Ob3PXe3fQNVz6qRcRYC7zE7gDEnfLeG+bCcBj0TEpwoet22SpkfE7g6+5IsRMaeDr9eSpGkR8XInXqvn9gjAbpK/THylfoWkQyXdk574d4+kQ9LlN6UnBa6StFnSWQ36zgG+B5yS/kTcV9KOzPqzJN2UGe86SX9MxztB0o2S/lprk7ZbKOkBSQ9K+rWkN6XLxyX1pY+HJN2bPr5c0oiklcDNRb1hkyXpPEnXZp4vl3Ri+niHpKXpLcF+L+lYSfem78dH0zbTJH1f0pr0/+Sz6fITJS3PjHutpPPSx+OSLpN0H/CxTm1rLwYBkhP5zpa0f93ya0nOeZoF/AK4JrPuQGA+8BHgyvoBI2IdcBlwW0TMiYgXW8zhrcAHSQJ5O/AD4D3AeyXNSb/RvwF8KL111hjw1RzbdjSwKCI+maNtkfbNHBaN5mg/E7g3Io4G/gl8m+TOq6cD30rbXEDymdIxwDHAZyQdlmPs/0TE/Ii4tf3NmJxePDQiIl6QdDNwMZD9hn0/cEb6+OckP+FrfhsRe4BNkg4oYBq3R0RI2gA8ExEbACQ9DAyQfIo+CNyf3ihlBvBAjnGX5QhhGdo9NHoJqN30ZQOwMyJ2pe/HQLp8ITArswfen+QM5ZdajH1bG/MoRE8GIfVD4EHgZxO0yX5IsjPzWACSvgOcCtDkmyDb/w1162rj7akbew/J+/oycHdELG4w7m727o3rx/1Xg/ZVyc4TXjnXXbH3Q6j/vQcRsUdS7ftKwBcj4q7soJLmTzAuVPAe9OqhERHxHPArkt1vzSqS0zgAzgbuazHG19PDoGY/CZ+R9G5J+5Ds8tuxGpgn6XAASW+UdGS6bpzkEAjgzDbH7aRxYI6kfSQdTPs3grwL+Lyk1wFIOlLJvfSeAAYlvT49vD2pyElPRs8GIXU1yWm/NRcD50taD5wDfGmK418KLAf+ADzdTseI2AacB/wync9q4Kh09TeBH0n6M8meo1vdDzxOcuhzFckeuB03AJuAByVtBH4KTI+IJ0l+iK0n+V3uocJmPEk+xcKM3t8jmBXCQTDDQTADHAQzwEEwAxwEM8BBMAPgvxZLhGCTUaYoAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "names = ['Non-fumeur', 'Fumeur']\n", + "values = [taux_mortalite(Non_fumeur),taux_mortalite(Fumeur)]\n", + "#print(values)\n", + "\n", + "plt.figure(figsize=(9, 3))\n", + "\n", + "plt.subplot(131)\n", + "plt.bar(names, values)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ce résultat est surprenant car il a été prouvé maintes fois que fumer nuit à la santé. Ce résultat est donc contre-intuitif." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "2.Reprenez la question 1 (effectifs et taux de mortalité) en rajoutant une nouvelle catégorie liée à la classe d'âge. On considérera par exemple les classes suivantes : 18-34 ans, 34-54 ans, 55-64 ans, plus de 65 ans. En quoi ce résultat est-il surprenant ? Arrivez-vous à expliquer ce paradoxe ? De même, vous pourrez proposer une représentation graphique de ces données pour étayer vos explications." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def Separer_par_age(data):\n", + " dic={\"18-34\":[],\"35-54\":[],\"55-64\":[],\"65+\":[]}\n", + " for ligne in df.itertuples():\n", + " age=ligne.Age\n", + " if age>=65:\n", + " dic[\"65+\"].append(ligne)\n", + " else:\n", + " if age>=55 :\n", + " dic[\"55-64\"].append(ligne)\n", + " else:\n", + " if age>=35:\n", + " dic[\"35-54\"].append(ligne)\n", + " else:\n", + " dic[\"18-34\"].append(ligne)\n", + " return dic " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Pour chaque ligne, on vient l'ajouter à un tableau dans un dictionnaire. On viens ensuite faire la manipulation précédente sur chacune des clés dans le dictionnaire." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "x=Separer_par_age(Data)\n", + "def recuperer_stats_age(Data):\n", + " for i in Data.keys():\n", + " temp=[0,0,0,0]\n", + " for ligne in Data[i]:\n", + " compte_stats(temp, ligne)\n", + " Data[i]=diviser_data_fumeur(temp)\n", + "recuperer_stats_age(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "On créée une fonction pour automatiser la séparation et récupération des données. On viens ensuite examiner chacune des tranches d'âge et afficher des graphes." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def taux_mortalite_par_age(data):\n", + " tabnf=[]\n", + " tabf=[]\n", + " for i in data.keys():\n", + " nf, f=data[i]\n", + " tabnf.append(taux_mortalite(nf))\n", + " tabf.append(taux_mortalite(f))\n", + " return tabnf,tabf" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "tab_n_fum,tab_fum=taux_mortalite_par_age(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "name = [\"18-34\",\"35-54\",\"55-64\",\"65+\"]\n", + " \n", + "plt.plot(name, tab_fum, label = \"Fumeurs\") \n", + "plt.plot(name, tab_n_fum, label = \"Non-fumeurs\") \n", + "plt.xlabel(\"Tranches d'âge\")\n", + "plt.ylabel(\"Taux de mortalité\")\n", + "plt.grid()\n", + "plt.legend(loc='lower right')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ce résultat est surprenant car il contredit les résultats précédents et place les fumeurs comme ayant le plus haut taux de mortalité. Lorsque le groupe est pris dans sa globalité, le résultat obtenu est différent de ceux que l'on obtient après avoir divisé le groupe en plusieurs sous-groupes. Je ne sais pas vraiment comment expliquer ce paradoxe. Il faudrait plus expérimenter avec les données." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3.Afin d'éviter un biais induit par des regroupements en tranches d'âges arbitraires et non régulières, il est envisageable d'essayer de réaliser une régression logistique. Si on introduit une variable Death valant 1 ou 0 pour indiquer si l'individu est décédé durant la période de 20 ans, on peut étudier le modèle Death ~ Age pour étudier la probabilité de décès en fonction de l'âge selon que l'on considère le groupe des fumeuses ou des non fumeuses. Ces régressions vous permettent-elles de conclure sur la nocivité du tabagisme ? Vous pourrez proposer une représentation graphique de ces régressions (en n'omettant pas les régions de confiance)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Je n'ai pas réussi à comprendre ce qu'est la regression logistique et à l'appliquer." + ] + } + ], "metadata": { "kernelspec": { "display_name": "Python 3", @@ -16,10 +852,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.3" + "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 } - -- 2.18.1