again

parent 765b008d
......@@ -48,7 +48,10 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Avant de commencer importons les modules nécessaires à notre analyse: "
]
......@@ -56,7 +59,10 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
......@@ -78,7 +84,9 @@
"cell_type": "code",
"execution_count": 6,
"metadata": {
"hideOutput": true
"hideCode": true,
"hideOutput": true,
"hidePrompt": true
},
"outputs": [],
"source": [
......@@ -87,7 +95,10 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Transformons ce jeu de données en DataFrame pandas pour pouvoir l'analyser comme il se doit. "
]
......@@ -107,8 +118,11 @@
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"execution_count": 9,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [
{
"data": {
......@@ -167,8 +181,345 @@
" <td>Alive</td>\n",
" <td>81.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>23.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>66.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>60.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>43.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>27.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>65.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>73.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>33.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>18.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>56.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>59.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>36.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1284</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1285</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>48.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1286</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1287</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>60.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1288</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>39.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1289</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1290</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1291</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1292</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>57.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1293</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1294</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>46.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1295</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>82.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1296</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1297</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>32.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1298</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1299</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1300</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1301</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1302</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>44.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1303</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>31.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1304</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>47.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1305</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>60.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1306</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>61.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1307</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1308</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>42.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1309</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1310</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>22.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1311</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>88.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1314 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
......@@ -177,29 +528,717 @@
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4"
"4 Yes Alive 81.4\n",
"5 No Alive 36.8\n",
"6 No Alive 23.8\n",
"7 Yes Dead 57.5\n",
"8 Yes Alive 24.8\n",
"9 Yes Alive 49.5\n",
"10 Yes Alive 30.0\n",
"11 No Dead 66.0\n",
"12 Yes Alive 49.2\n",
"13 No Alive 58.4\n",
"14 No Dead 60.6\n",
"15 No Alive 25.1\n",
"16 No Alive 43.5\n",
"17 No Alive 27.1\n",
"18 No Alive 58.3\n",
"19 Yes Alive 65.7\n",
"20 No Dead 73.2\n",
"21 Yes Alive 38.3\n",
"22 No Alive 33.4\n",
"23 Yes Dead 62.3\n",
"24 No Alive 18.0\n",
"25 No Alive 56.2\n",
"26 Yes Alive 59.2\n",
"27 No Alive 25.8\n",
"28 No Dead 36.9\n",
"29 No Alive 20.2\n",
"... ... ... ...\n",
"1284 Yes Dead 36.0\n",
"1285 Yes Alive 48.3\n",
"1286 No Alive 63.1\n",
"1287 No Alive 60.8\n",
"1288 Yes Dead 39.3\n",
"1289 No Alive 36.7\n",
"1290 No Alive 63.8\n",
"1291 No Dead 71.3\n",
"1292 No Alive 57.7\n",
"1293 No Alive 63.2\n",
"1294 No Alive 46.6\n",
"1295 Yes Dead 82.4\n",
"1296 Yes Alive 38.3\n",
"1297 Yes Alive 32.7\n",
"1298 No Alive 39.7\n",
"1299 Yes Dead 60.0\n",
"1300 No Dead 71.0\n",
"1301 No Alive 20.5\n",
"1302 No Alive 44.4\n",
"1303 Yes Alive 31.2\n",
"1304 Yes Alive 47.8\n",
"1305 Yes Alive 60.9\n",
"1306 No Dead 61.4\n",
"1307 Yes Alive 43.0\n",
"1308 No Alive 42.1\n",
"1309 Yes Alive 35.9\n",
"1310 No Alive 22.3\n",
"1311 Yes Dead 62.1\n",
"1312 No Dead 88.6\n",
"1313 No Alive 39.1\n",
"\n",
"[1314 rows x 3 columns]"
]
},
"execution_count": 8,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_data.head()"
"raw_data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"cell_type": "markdown",
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Nous avons donc dans cette étude 1314 femmes dont nous savons leur age et statut tabagique lors de la première étude ainsi que leur survie 20 ans plus tard. \n",
"=> Etudions donc le taux de mortalité chez ces femmes fumeuses et non fumeuses en dividant le nombre de femmes décédées par le nombre de femmes totales dans chacun des sous-groupes. "
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Pour cela créons 2 catégories de femmes: les fumeuses (smokers) et les non fumeuses (non_smokers)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>21.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>19.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>81.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>65.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>59.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>34.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>51.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>46.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>44.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>29.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>33.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>39.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>44.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>37.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>22.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>39.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>40.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>58.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>37.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>63</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1240</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>29.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1243</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>40.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1251</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>27.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1252</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>52.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1253</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>27.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1254</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>41.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1259</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>40.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1260</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>20.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1263</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>20.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1264</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>45.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1269</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1270</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>55.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1271</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1273</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>55.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1276</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>58.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1278</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1282</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>51.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1284</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1285</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>48.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1288</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>39.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1295</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>82.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1296</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1297</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>32.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1299</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1303</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>31.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1304</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>47.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1305</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>60.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1307</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1309</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1311</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>582 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"4 Yes Alive 81.4\n",
"7 Yes Dead 57.5\n",
"8 Yes Alive 24.8\n",
"9 Yes Alive 49.5\n",
"10 Yes Alive 30.0\n",
"12 Yes Alive 49.2\n",
"19 Yes Alive 65.7\n",
"21 Yes Alive 38.3\n",
"23 Yes Dead 62.3\n",
"26 Yes Alive 59.2\n",
"30 Yes Alive 34.6\n",
"31 Yes Alive 51.9\n",
"32 Yes Alive 49.9\n",
"35 Yes Alive 46.7\n",
"36 Yes Alive 44.4\n",
"37 Yes Alive 29.5\n",
"38 Yes Dead 33.0\n",
"39 Yes Alive 35.6\n",
"40 Yes Alive 39.1\n",
"42 Yes Alive 35.7\n",
"46 Yes Dead 44.3\n",
"48 Yes Alive 37.5\n",
"49 Yes Alive 22.1\n",
"53 Yes Alive 39.0\n",
"56 Yes Alive 40.1\n",
"60 Yes Alive 58.1\n",
"61 Yes Alive 37.3\n",
"63 Yes Dead 36.3\n",
"... ... ... ...\n",
"1240 Yes Alive 29.7\n",
"1243 Yes Alive 40.1\n",
"1251 Yes Alive 27.8\n",
"1252 Yes Alive 52.4\n",
"1253 Yes Alive 27.8\n",
"1254 Yes Alive 41.0\n",
"1259 Yes Alive 40.8\n",
"1260 Yes Alive 20.4\n",
"1263 Yes Alive 20.9\n",
"1264 Yes Alive 45.5\n",
"1269 Yes Alive 38.8\n",
"1270 Yes Alive 55.5\n",
"1271 Yes Alive 24.9\n",
"1273 Yes Alive 55.7\n",
"1276 Yes Alive 58.5\n",
"1278 Yes Alive 43.7\n",
"1282 Yes Alive 51.2\n",
"1284 Yes Dead 36.0\n",
"1285 Yes Alive 48.3\n",
"1288 Yes Dead 39.3\n",
"1295 Yes Dead 82.4\n",
"1296 Yes Alive 38.3\n",
"1297 Yes Alive 32.7\n",
"1299 Yes Dead 60.0\n",
"1303 Yes Alive 31.2\n",
"1304 Yes Alive 47.8\n",
"1305 Yes Alive 60.9\n",
"1307 Yes Alive 43.0\n",
"1309 Yes Alive 35.9\n",
"1311 Yes Dead 62.1\n",
"\n",
"[582 rows x 3 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"smokers = raw_data[raw_data['Smoker'] == 'Yes']\n",
"non_smokers = raw_data[raw_data['Smoker'] == 'No']"
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Il nous faudrait maintenant, dans ces 2 catégories de femmes, des sous groupes dans lesquels nous pourrions distinguer si la personne est en vie ou nous après 20 ans."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [],
"source": [
"Alive_smokers = smokers[smokers['Status'] == 'Alive']\n",
"Dead_smokers = smokers[smokers['Status'] == 'Dead']\n",
"Alive_non_smokers = non_smokers[non_smokers['Status'] == 'Alive']\n",
"Dead_non_smokers = non_smokers[non_smokers['Status'] == 'Dead']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"hideCode": true,
"hidePrompt": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"\n",
"\n",
"len(smokers) + len(non_smokers)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [
{
"data": {
"text/plain": [
"(230, 502)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(Dead_non_smokers), len(Alive_non_smokers)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [
{
"data": {
"text/plain": [
"(139, 443)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(Dead_smokers), len(Alive_smokers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Dans le but de présenter les données sous forme d'un tableau, il nous faut créer une liste de liste, autrement un dictionnaire. "
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"source": [
"Représentez dans un tableau le nombre total de femmes vivantes et décédées sur la période en fonction de leur habitude de tabagisme. Calculez dans chaque groupe (fumeuses / non fumeuses) le taux de mortalité (le rapport entre le nombre de femmes décédées dans un groupe et le nombre total de femmes dans ce groupe). Vous pourrez proposer une représentation graphique de ces données et calculer des intervalles de confiance si vous le souhaitez. En quoi ce résultat est-il surprenant ?\n",
"\n",
"Reprenez la question 1 (effectifs et taux de mortalité) en rajoutant une nouvelle catégorie liée à la classe d'âge. On considérera par exemple les classes suivantes : 18-34 ans, 34-54 ans, 55-64 ans, plus de 65 ans. En quoi ce résultat est-il surprenant ? Arrivez-vous à expliquer ce paradoxe ? De même, vous pourrez proposer une représentation graphique de ces données pour étayer vos explications.\n",
"\n",
"Afin d'éviter un biais induit par des regroupements en tranches d'âges arbitraires et non régulières, il est envisageable d'essayer de réaliser une régression logistique. Si on introduit une variable Death valant 1 ou 0 pour indiquer si l'individu est décédé durant la période de 20 ans, on peut étudier le modèle Death ~ Age pour étudier la probabilité de décès en fonction de l'âge selon que l'on considère le groupe des fumeuses ou des non fumeuses. Ces régressions vous permettent-elles de conclure sur la nocivité du tabagisme ? Vous pourrez proposer une représentation graphique de ces régressions (en n'omettant pas les régions de confiance).\n",
"\n",
"Déposez votre étude dans FUN"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hideCode": true,
"hidePrompt": true
},
"outputs": [],
"source": []
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment