Commit de ce que j'ai fait sur le paadoxe de Simpson, surtout un prétexte pour...

Commit de ce que j'ai fait sur le paadoxe de Simpson, surtout un prétexte pour découvrir pandas, je n'ai pas essayé de faire les régressions.
parent 9f6e799b
{ {
"cells": [], "cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analyse des épidémoies de varicelle"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from matplotlib import pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>week</th>\n",
" <th>indicator</th>\n",
" <th>inc</th>\n",
" <th>inc_low</th>\n",
" <th>inc_up</th>\n",
" <th>inc100</th>\n",
" <th>inc100_low</th>\n",
" <th>inc100_up</th>\n",
" <th>geo_insee</th>\n",
" <th>geo_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>202450</td>\n",
" <td>7</td>\n",
" <td>7532</td>\n",
" <td>4384</td>\n",
" <td>10680</td>\n",
" <td>11</td>\n",
" <td>6</td>\n",
" <td>16</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>202449</td>\n",
" <td>7</td>\n",
" <td>6015</td>\n",
" <td>3576</td>\n",
" <td>8454</td>\n",
" <td>9</td>\n",
" <td>5</td>\n",
" <td>13</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>202448</td>\n",
" <td>7</td>\n",
" <td>4189</td>\n",
" <td>1454</td>\n",
" <td>6924</td>\n",
" <td>6</td>\n",
" <td>2</td>\n",
" <td>10</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>202447</td>\n",
" <td>7</td>\n",
" <td>1931</td>\n",
" <td>726</td>\n",
" <td>3136</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>202446</td>\n",
" <td>7</td>\n",
" <td>2260</td>\n",
" <td>863</td>\n",
" <td>3657</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>202445</td>\n",
" <td>7</td>\n",
" <td>2713</td>\n",
" <td>1216</td>\n",
" <td>4210</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>202444</td>\n",
" <td>7</td>\n",
" <td>2135</td>\n",
" <td>676</td>\n",
" <td>3594</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>202443</td>\n",
" <td>7</td>\n",
" <td>2124</td>\n",
" <td>641</td>\n",
" <td>3607</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>202442</td>\n",
" <td>7</td>\n",
" <td>2621</td>\n",
" <td>1246</td>\n",
" <td>3996</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>202441</td>\n",
" <td>7</td>\n",
" <td>2035</td>\n",
" <td>381</td>\n",
" <td>3689</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>202440</td>\n",
" <td>7</td>\n",
" <td>2125</td>\n",
" <td>725</td>\n",
" <td>3525</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>202439</td>\n",
" <td>7</td>\n",
" <td>2898</td>\n",
" <td>1333</td>\n",
" <td>4463</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>202438</td>\n",
" <td>7</td>\n",
" <td>751</td>\n",
" <td>0</td>\n",
" <td>1513</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>202437</td>\n",
" <td>7</td>\n",
" <td>916</td>\n",
" <td>28</td>\n",
" <td>1804</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>202436</td>\n",
" <td>7</td>\n",
" <td>2235</td>\n",
" <td>870</td>\n",
" <td>3600</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>202435</td>\n",
" <td>7</td>\n",
" <td>1620</td>\n",
" <td>285</td>\n",
" <td>2955</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>202434</td>\n",
" <td>7</td>\n",
" <td>2560</td>\n",
" <td>622</td>\n",
" <td>4498</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>7</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>202433</td>\n",
" <td>7</td>\n",
" <td>1971</td>\n",
" <td>536</td>\n",
" <td>3406</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>202432</td>\n",
" <td>7</td>\n",
" <td>4399</td>\n",
" <td>1944</td>\n",
" <td>6854</td>\n",
" <td>7</td>\n",
" <td>3</td>\n",
" <td>11</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>202431</td>\n",
" <td>7</td>\n",
" <td>4500</td>\n",
" <td>2213</td>\n",
" <td>6787</td>\n",
" <td>7</td>\n",
" <td>4</td>\n",
" <td>10</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>202430</td>\n",
" <td>7</td>\n",
" <td>7004</td>\n",
" <td>4278</td>\n",
" <td>9730</td>\n",
" <td>11</td>\n",
" <td>7</td>\n",
" <td>15</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>202429</td>\n",
" <td>7</td>\n",
" <td>9270</td>\n",
" <td>6303</td>\n",
" <td>12237</td>\n",
" <td>14</td>\n",
" <td>10</td>\n",
" <td>18</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>202428</td>\n",
" <td>7</td>\n",
" <td>9364</td>\n",
" <td>6498</td>\n",
" <td>12230</td>\n",
" <td>14</td>\n",
" <td>10</td>\n",
" <td>18</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>202427</td>\n",
" <td>7</td>\n",
" <td>10247</td>\n",
" <td>7090</td>\n",
" <td>13404</td>\n",
" <td>15</td>\n",
" <td>10</td>\n",
" <td>20</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>202426</td>\n",
" <td>7</td>\n",
" <td>14368</td>\n",
" <td>10399</td>\n",
" <td>18337</td>\n",
" <td>22</td>\n",
" <td>16</td>\n",
" <td>28</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>202425</td>\n",
" <td>7</td>\n",
" <td>11174</td>\n",
" <td>8039</td>\n",
" <td>14309</td>\n",
" <td>17</td>\n",
" <td>12</td>\n",
" <td>22</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>202424</td>\n",
" <td>7</td>\n",
" <td>12621</td>\n",
" <td>9357</td>\n",
" <td>15885</td>\n",
" <td>19</td>\n",
" <td>14</td>\n",
" <td>24</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>202423</td>\n",
" <td>7</td>\n",
" <td>14657</td>\n",
" <td>11339</td>\n",
" <td>17975</td>\n",
" <td>22</td>\n",
" <td>17</td>\n",
" <td>27</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>202422</td>\n",
" <td>7</td>\n",
" <td>11628</td>\n",
" <td>8361</td>\n",
" <td>14895</td>\n",
" <td>17</td>\n",
" <td>12</td>\n",
" <td>22</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>202421</td>\n",
" <td>7</td>\n",
" <td>9701</td>\n",
" <td>6851</td>\n",
" <td>12551</td>\n",
" <td>15</td>\n",
" <td>11</td>\n",
" <td>19</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1746</th>\n",
" <td>199126</td>\n",
" <td>7</td>\n",
" <td>17608</td>\n",
" <td>11304</td>\n",
" <td>23912</td>\n",
" <td>31</td>\n",
" <td>20</td>\n",
" <td>42</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1747</th>\n",
" <td>199125</td>\n",
" <td>7</td>\n",
" <td>16169</td>\n",
" <td>10700</td>\n",
" <td>21638</td>\n",
" <td>28</td>\n",
" <td>18</td>\n",
" <td>38</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1748</th>\n",
" <td>199124</td>\n",
" <td>7</td>\n",
" <td>16171</td>\n",
" <td>10071</td>\n",
" <td>22271</td>\n",
" <td>28</td>\n",
" <td>17</td>\n",
" <td>39</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1749</th>\n",
" <td>199123</td>\n",
" <td>7</td>\n",
" <td>11947</td>\n",
" <td>7671</td>\n",
" <td>16223</td>\n",
" <td>21</td>\n",
" <td>13</td>\n",
" <td>29</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1750</th>\n",
" <td>199122</td>\n",
" <td>7</td>\n",
" <td>15452</td>\n",
" <td>9953</td>\n",
" <td>20951</td>\n",
" <td>27</td>\n",
" <td>17</td>\n",
" <td>37</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1751</th>\n",
" <td>199121</td>\n",
" <td>7</td>\n",
" <td>14903</td>\n",
" <td>8975</td>\n",
" <td>20831</td>\n",
" <td>26</td>\n",
" <td>16</td>\n",
" <td>36</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1752</th>\n",
" <td>199120</td>\n",
" <td>7</td>\n",
" <td>19053</td>\n",
" <td>12742</td>\n",
" <td>25364</td>\n",
" <td>34</td>\n",
" <td>23</td>\n",
" <td>45</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1753</th>\n",
" <td>199119</td>\n",
" <td>7</td>\n",
" <td>16739</td>\n",
" <td>11246</td>\n",
" <td>22232</td>\n",
" <td>29</td>\n",
" <td>19</td>\n",
" <td>39</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1754</th>\n",
" <td>199118</td>\n",
" <td>7</td>\n",
" <td>21385</td>\n",
" <td>13882</td>\n",
" <td>28888</td>\n",
" <td>38</td>\n",
" <td>25</td>\n",
" <td>51</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1755</th>\n",
" <td>199117</td>\n",
" <td>7</td>\n",
" <td>13462</td>\n",
" <td>8877</td>\n",
" <td>18047</td>\n",
" <td>24</td>\n",
" <td>16</td>\n",
" <td>32</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1756</th>\n",
" <td>199116</td>\n",
" <td>7</td>\n",
" <td>14857</td>\n",
" <td>10068</td>\n",
" <td>19646</td>\n",
" <td>26</td>\n",
" <td>18</td>\n",
" <td>34</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1757</th>\n",
" <td>199115</td>\n",
" <td>7</td>\n",
" <td>13975</td>\n",
" <td>9781</td>\n",
" <td>18169</td>\n",
" <td>25</td>\n",
" <td>18</td>\n",
" <td>32</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1758</th>\n",
" <td>199114</td>\n",
" <td>7</td>\n",
" <td>12265</td>\n",
" <td>7684</td>\n",
" <td>16846</td>\n",
" <td>22</td>\n",
" <td>14</td>\n",
" <td>30</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1759</th>\n",
" <td>199113</td>\n",
" <td>7</td>\n",
" <td>9567</td>\n",
" <td>6041</td>\n",
" <td>13093</td>\n",
" <td>17</td>\n",
" <td>11</td>\n",
" <td>23</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1760</th>\n",
" <td>199112</td>\n",
" <td>7</td>\n",
" <td>10864</td>\n",
" <td>7331</td>\n",
" <td>14397</td>\n",
" <td>19</td>\n",
" <td>13</td>\n",
" <td>25</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1761</th>\n",
" <td>199111</td>\n",
" <td>7</td>\n",
" <td>15574</td>\n",
" <td>11184</td>\n",
" <td>19964</td>\n",
" <td>27</td>\n",
" <td>19</td>\n",
" <td>35</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1762</th>\n",
" <td>199110</td>\n",
" <td>7</td>\n",
" <td>16643</td>\n",
" <td>11372</td>\n",
" <td>21914</td>\n",
" <td>29</td>\n",
" <td>20</td>\n",
" <td>38</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1763</th>\n",
" <td>199109</td>\n",
" <td>7</td>\n",
" <td>13741</td>\n",
" <td>8780</td>\n",
" <td>18702</td>\n",
" <td>24</td>\n",
" <td>15</td>\n",
" <td>33</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1764</th>\n",
" <td>199108</td>\n",
" <td>7</td>\n",
" <td>13289</td>\n",
" <td>8813</td>\n",
" <td>17765</td>\n",
" <td>23</td>\n",
" <td>15</td>\n",
" <td>31</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1765</th>\n",
" <td>199107</td>\n",
" <td>7</td>\n",
" <td>12337</td>\n",
" <td>8077</td>\n",
" <td>16597</td>\n",
" <td>22</td>\n",
" <td>15</td>\n",
" <td>29</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1766</th>\n",
" <td>199106</td>\n",
" <td>7</td>\n",
" <td>10877</td>\n",
" <td>7013</td>\n",
" <td>14741</td>\n",
" <td>19</td>\n",
" <td>12</td>\n",
" <td>26</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1767</th>\n",
" <td>199105</td>\n",
" <td>7</td>\n",
" <td>10442</td>\n",
" <td>6544</td>\n",
" <td>14340</td>\n",
" <td>18</td>\n",
" <td>11</td>\n",
" <td>25</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1768</th>\n",
" <td>199104</td>\n",
" <td>7</td>\n",
" <td>7913</td>\n",
" <td>4563</td>\n",
" <td>11263</td>\n",
" <td>14</td>\n",
" <td>8</td>\n",
" <td>20</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1769</th>\n",
" <td>199103</td>\n",
" <td>7</td>\n",
" <td>15387</td>\n",
" <td>10484</td>\n",
" <td>20290</td>\n",
" <td>27</td>\n",
" <td>18</td>\n",
" <td>36</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1770</th>\n",
" <td>199102</td>\n",
" <td>7</td>\n",
" <td>16277</td>\n",
" <td>11046</td>\n",
" <td>21508</td>\n",
" <td>29</td>\n",
" <td>20</td>\n",
" <td>38</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1771</th>\n",
" <td>199101</td>\n",
" <td>7</td>\n",
" <td>15565</td>\n",
" <td>10271</td>\n",
" <td>20859</td>\n",
" <td>27</td>\n",
" <td>18</td>\n",
" <td>36</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1772</th>\n",
" <td>199052</td>\n",
" <td>7</td>\n",
" <td>19375</td>\n",
" <td>13295</td>\n",
" <td>25455</td>\n",
" <td>34</td>\n",
" <td>23</td>\n",
" <td>45</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1773</th>\n",
" <td>199051</td>\n",
" <td>7</td>\n",
" <td>19080</td>\n",
" <td>13807</td>\n",
" <td>24353</td>\n",
" <td>34</td>\n",
" <td>25</td>\n",
" <td>43</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1774</th>\n",
" <td>199050</td>\n",
" <td>7</td>\n",
" <td>11079</td>\n",
" <td>6660</td>\n",
" <td>15498</td>\n",
" <td>20</td>\n",
" <td>12</td>\n",
" <td>28</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1775</th>\n",
" <td>199049</td>\n",
" <td>7</td>\n",
" <td>1143</td>\n",
" <td>0</td>\n",
" <td>2610</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>FR</td>\n",
" <td>France</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1776 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" week indicator inc inc_low inc_up inc100 inc100_low \\\n",
"0 202450 7 7532 4384 10680 11 6 \n",
"1 202449 7 6015 3576 8454 9 5 \n",
"2 202448 7 4189 1454 6924 6 2 \n",
"3 202447 7 1931 726 3136 3 1 \n",
"4 202446 7 2260 863 3657 3 1 \n",
"5 202445 7 2713 1216 4210 4 2 \n",
"6 202444 7 2135 676 3594 3 1 \n",
"7 202443 7 2124 641 3607 3 1 \n",
"8 202442 7 2621 1246 3996 4 2 \n",
"9 202441 7 2035 381 3689 3 1 \n",
"10 202440 7 2125 725 3525 3 1 \n",
"11 202439 7 2898 1333 4463 4 2 \n",
"12 202438 7 751 0 1513 1 0 \n",
"13 202437 7 916 28 1804 1 0 \n",
"14 202436 7 2235 870 3600 3 1 \n",
"15 202435 7 1620 285 2955 2 0 \n",
"16 202434 7 2560 622 4498 4 1 \n",
"17 202433 7 1971 536 3406 3 1 \n",
"18 202432 7 4399 1944 6854 7 3 \n",
"19 202431 7 4500 2213 6787 7 4 \n",
"20 202430 7 7004 4278 9730 11 7 \n",
"21 202429 7 9270 6303 12237 14 10 \n",
"22 202428 7 9364 6498 12230 14 10 \n",
"23 202427 7 10247 7090 13404 15 10 \n",
"24 202426 7 14368 10399 18337 22 16 \n",
"25 202425 7 11174 8039 14309 17 12 \n",
"26 202424 7 12621 9357 15885 19 14 \n",
"27 202423 7 14657 11339 17975 22 17 \n",
"28 202422 7 11628 8361 14895 17 12 \n",
"29 202421 7 9701 6851 12551 15 11 \n",
"... ... ... ... ... ... ... ... \n",
"1746 199126 7 17608 11304 23912 31 20 \n",
"1747 199125 7 16169 10700 21638 28 18 \n",
"1748 199124 7 16171 10071 22271 28 17 \n",
"1749 199123 7 11947 7671 16223 21 13 \n",
"1750 199122 7 15452 9953 20951 27 17 \n",
"1751 199121 7 14903 8975 20831 26 16 \n",
"1752 199120 7 19053 12742 25364 34 23 \n",
"1753 199119 7 16739 11246 22232 29 19 \n",
"1754 199118 7 21385 13882 28888 38 25 \n",
"1755 199117 7 13462 8877 18047 24 16 \n",
"1756 199116 7 14857 10068 19646 26 18 \n",
"1757 199115 7 13975 9781 18169 25 18 \n",
"1758 199114 7 12265 7684 16846 22 14 \n",
"1759 199113 7 9567 6041 13093 17 11 \n",
"1760 199112 7 10864 7331 14397 19 13 \n",
"1761 199111 7 15574 11184 19964 27 19 \n",
"1762 199110 7 16643 11372 21914 29 20 \n",
"1763 199109 7 13741 8780 18702 24 15 \n",
"1764 199108 7 13289 8813 17765 23 15 \n",
"1765 199107 7 12337 8077 16597 22 15 \n",
"1766 199106 7 10877 7013 14741 19 12 \n",
"1767 199105 7 10442 6544 14340 18 11 \n",
"1768 199104 7 7913 4563 11263 14 8 \n",
"1769 199103 7 15387 10484 20290 27 18 \n",
"1770 199102 7 16277 11046 21508 29 20 \n",
"1771 199101 7 15565 10271 20859 27 18 \n",
"1772 199052 7 19375 13295 25455 34 23 \n",
"1773 199051 7 19080 13807 24353 34 25 \n",
"1774 199050 7 11079 6660 15498 20 12 \n",
"1775 199049 7 1143 0 2610 2 0 \n",
"\n",
" inc100_up geo_insee geo_name \n",
"0 16 FR France \n",
"1 13 FR France \n",
"2 10 FR France \n",
"3 5 FR France \n",
"4 5 FR France \n",
"5 6 FR France \n",
"6 5 FR France \n",
"7 5 FR France \n",
"8 6 FR France \n",
"9 5 FR France \n",
"10 5 FR France \n",
"11 6 FR France \n",
"12 2 FR France \n",
"13 2 FR France \n",
"14 5 FR France \n",
"15 4 FR France \n",
"16 7 FR France \n",
"17 5 FR France \n",
"18 11 FR France \n",
"19 10 FR France \n",
"20 15 FR France \n",
"21 18 FR France \n",
"22 18 FR France \n",
"23 20 FR France \n",
"24 28 FR France \n",
"25 22 FR France \n",
"26 24 FR France \n",
"27 27 FR France \n",
"28 22 FR France \n",
"29 19 FR France \n",
"... ... ... ... \n",
"1746 42 FR France \n",
"1747 38 FR France \n",
"1748 39 FR France \n",
"1749 29 FR France \n",
"1750 37 FR France \n",
"1751 36 FR France \n",
"1752 45 FR France \n",
"1753 39 FR France \n",
"1754 51 FR France \n",
"1755 32 FR France \n",
"1756 34 FR France \n",
"1757 32 FR France \n",
"1758 30 FR France \n",
"1759 23 FR France \n",
"1760 25 FR France \n",
"1761 35 FR France \n",
"1762 38 FR France \n",
"1763 33 FR France \n",
"1764 31 FR France \n",
"1765 29 FR France \n",
"1766 26 FR France \n",
"1767 25 FR France \n",
"1768 20 FR France \n",
"1769 36 FR France \n",
"1770 38 FR France \n",
"1771 36 FR France \n",
"1772 45 FR France \n",
"1773 43 FR France \n",
"1774 28 FR France \n",
"1775 5 FR France \n",
"\n",
"[1776 rows x 10 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_data=pd.read_csv('inc-7-PAY.csv',skiprows=1)\n",
"raw_data"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1776 entries, 0 to 1775\n",
"Data columns (total 10 columns):\n",
"week 1776 non-null int64\n",
"indicator 1776 non-null int64\n",
"inc 1776 non-null int64\n",
"inc_low 1776 non-null int64\n",
"inc_up 1776 non-null int64\n",
"inc100 1776 non-null int64\n",
"inc100_low 1776 non-null int64\n",
"inc100_up 1776 non-null int64\n",
"geo_insee 1776 non-null object\n",
"geo_name 1776 non-null object\n",
"dtypes: int64(8), object(2)\n",
"memory usage: 138.8+ KB\n"
]
}
],
"source": [
"raw_data.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On voit ici qu'il n'y a aucune données manquantes entre fin 1990 et fin 2024, et toutes les données numériques sont au format `int`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": "Python 3",
...@@ -16,10 +1056,9 @@ ...@@ -16,10 +1056,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.3" "version": "3.6.4"
} }
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }
{ {
"cells": [], "cells": [
{
"cell_type": "markdown",
"metadata": { "metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"# Autour du paradoxe de Simpson"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>21.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>19.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>47.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>81.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>23.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>66.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>60.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>43.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>27.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>65.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>73.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>33.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>18.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>56.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>59.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>36.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1284</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1285</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>48.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1286</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1287</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>60.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1288</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>39.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1289</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1290</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1291</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1292</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>57.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1293</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1294</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>46.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1295</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>82.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1296</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1297</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>32.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1298</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1299</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1300</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1301</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1302</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>44.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1303</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>31.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1304</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>47.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1305</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>60.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1306</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>61.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1307</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1308</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>42.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1309</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1310</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>22.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1311</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>88.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1314 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4\n",
"5 No Alive 36.8\n",
"6 No Alive 23.8\n",
"7 Yes Dead 57.5\n",
"8 Yes Alive 24.8\n",
"9 Yes Alive 49.5\n",
"10 Yes Alive 30.0\n",
"11 No Dead 66.0\n",
"12 Yes Alive 49.2\n",
"13 No Alive 58.4\n",
"14 No Dead 60.6\n",
"15 No Alive 25.1\n",
"16 No Alive 43.5\n",
"17 No Alive 27.1\n",
"18 No Alive 58.3\n",
"19 Yes Alive 65.7\n",
"20 No Dead 73.2\n",
"21 Yes Alive 38.3\n",
"22 No Alive 33.4\n",
"23 Yes Dead 62.3\n",
"24 No Alive 18.0\n",
"25 No Alive 56.2\n",
"26 Yes Alive 59.2\n",
"27 No Alive 25.8\n",
"28 No Dead 36.9\n",
"29 No Alive 20.2\n",
"... ... ... ...\n",
"1284 Yes Dead 36.0\n",
"1285 Yes Alive 48.3\n",
"1286 No Alive 63.1\n",
"1287 No Alive 60.8\n",
"1288 Yes Dead 39.3\n",
"1289 No Alive 36.7\n",
"1290 No Alive 63.8\n",
"1291 No Dead 71.3\n",
"1292 No Alive 57.7\n",
"1293 No Alive 63.2\n",
"1294 No Alive 46.6\n",
"1295 Yes Dead 82.4\n",
"1296 Yes Alive 38.3\n",
"1297 Yes Alive 32.7\n",
"1298 No Alive 39.7\n",
"1299 Yes Dead 60.0\n",
"1300 No Dead 71.0\n",
"1301 No Alive 20.5\n",
"1302 No Alive 44.4\n",
"1303 Yes Alive 31.2\n",
"1304 Yes Alive 47.8\n",
"1305 Yes Alive 60.9\n",
"1306 No Dead 61.4\n",
"1307 Yes Alive 43.0\n",
"1308 No Alive 42.1\n",
"1309 Yes Alive 35.9\n",
"1310 No Alive 22.3\n",
"1311 Yes Dead 62.1\n",
"1312 No Dead 88.6\n",
"1313 No Alive 39.1\n",
"\n",
"[1314 rows x 3 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"donnees = pd.read_csv('Subject6_smoking.csv')\n",
"donnees"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1314 entries, 0 to 1313\n",
"Data columns (total 3 columns):\n",
"Smoker 1314 non-null object\n",
"Status 1314 non-null object\n",
"Age 1314 non-null float64\n",
"dtypes: float64(1), object(2)\n",
"memory usage: 30.9+ KB\n"
]
}
],
"source": [
"donnees.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"Pour faire un peu de statistiques sur ces données, on peut représenter `Yes` par `1` et `No`par `0`, et `Alive`par `1`et `Dead` par `0`. On va en fait procéder autrement mais c'était ma première approche naïve, je laisse donc mon code initial ainsi que les résultats obtenus. Au lieu d'utiliser la methode `apply` j'aurais également pu utiliser la méthode `replace`."
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"``` \n",
"def convert(x):\n",
" if (x=='Yes') | (x==\"Alive\"):\n",
" return 1\n",
" elif (x=='No') | (x=='Dead'):\n",
" return 0\n",
"\n",
"donnees['Smoker'] = donnees['Smoker'].apply(convert)\n",
"donnees['Status'] = donnees['Status'].apply(convert)\n",
"donnees[['Smoker','Status']].sum()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"Parmi les 1314 femmes sondées, il y a donc 582 fumeuses, et 945 des femmes (fumeuses et non fumeuses) sont encore vivantes 20 ans après."
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"Ré-obtenons ces informations à l'aide des méthodes de regroupement."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"outputs": [
{
"data": {
"text/plain": [
"No 732\n",
"Yes 582\n",
"Name: Smoker, dtype: int64"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"donnees['Smoker'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"hideCode": false,
"hidePrompt": true,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Alive 945\n",
"Dead 369\n",
"Name: Status, dtype: int64"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"donnees['Status'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"Maintenant regardons les infos jointes, pour essayer de comprendre les dépendances. Pour ça, on pourrait écrire \n",
"```\n",
"donnees[['Smoker,'Status']].value_counts()\n",
"```\n",
"pour avoir le tableau souhaité, mais ce notebook jupyter utilise une version non à jour de panda où on ne peut pas utiliser `value_counts` sur un `dataframe`, donc on ruse."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Total</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">No</th>\n",
" <th>Alive</th>\n",
" <td>502</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dead</th>\n",
" <td>230</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Yes</th>\n",
" <th>Alive</th>\n",
" <td>443</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dead</th>\n",
" <td>139</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Total\n",
"Smoker Status \n",
"No Alive 502\n",
" Dead 230\n",
"Yes Alive 443\n",
" Dead 139"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tableau = donnees.groupby(['Smoker','Status']).count()\n",
"tableau = tableau.rename(columns={'Age':'Total'})\n",
"tableau"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"MultiIndex: 4 entries, (No, Alive) to (Yes, Dead)\n",
"Data columns (total 1 columns):\n",
"Total 4 non-null int64\n",
"dtypes: int64(1)\n",
"memory usage: 238.0+ bytes\n"
]
}
],
"source": [
"tableau.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Un autre moyen d'obtenir le même tableau (toujours avec le même souci de nom)."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Age</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">No</th>\n",
" <th>Alive</th>\n",
" <td>502</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dead</th>\n",
" <td>230</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Yes</th>\n",
" <th>Alive</th>\n",
" <td>443</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dead</th>\n",
" <td>139</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <th></th>\n",
" <td>1314</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Age\n",
"Smoker Status \n",
"No Alive 502\n",
" Dead 230\n",
"Yes Alive 443\n",
" Dead 139\n",
"All 1314"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"donnees.pivot_table(index = ['Smoker','Status'], aggfunc='count', margins=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Une manière plus agréable visuellement d'avoir les mêmes données, et en supprimpant le problème de nom :"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>502</td>\n",
" <td>230</td>\n",
" <td>732</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>443</td>\n",
" <td>139</td>\n",
" <td>582</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>945</td>\n",
" <td>369</td>\n",
" <td>1314</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All\n",
"Smoker \n",
"No 502 230 732\n",
"Yes 443 139 582\n",
"All 945 369 1314"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table2 = donnees.pivot_table(index = 'Smoker',values='Age', columns='Status', aggfunc='count', margins=True) \n",
"table2"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 3 entries, No to All\n",
"Data columns (total 3 columns):\n",
"Alive 3 non-null int64\n",
"Dead 3 non-null int64\n",
"All 3 non-null int64\n",
"dtypes: int64(3)\n",
"memory usage: 96.0+ bytes\n"
]
}
],
"source": [
"table2.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remarquons que si on ne précise pas `values='Age'`, on obtient quelque chose de proche mais avec un souci de nom à nouveau."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th colspan=\"3\" halign=\"left\">Age</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>502</td>\n",
" <td>230</td>\n",
" <td>732</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>443</td>\n",
" <td>139</td>\n",
" <td>582</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>945</td>\n",
" <td>369</td>\n",
" <td>1314</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Age \n",
"Status Alive Dead All\n",
"Smoker \n",
"No 502 230 732\n",
"Yes 443 139 582\n",
"All 945 369 1314"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table3 = donnees.pivot_table(index = 'Smoker', columns='Status', aggfunc='count', margins=True) \n",
"table3"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 3 entries, No to All\n",
"Data columns (total 3 columns):\n",
"(Age, Alive) 3 non-null int64\n",
"(Age, Dead) 3 non-null int64\n",
"(Age, All) 3 non-null int64\n",
"dtypes: int64(3)\n",
"memory usage: 96.0+ bytes\n"
]
}
],
"source": [
"table3.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"hideCode": false,
"hidePrompt": true
},
"source": [
"Evaluons maintenant le taux de mortalité selon si on fume ou non."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" <th>ratio_deces</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>502</td>\n",
" <td>230</td>\n",
" <td>732</td>\n",
" <td>0.314208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>443</td>\n",
" <td>139</td>\n",
" <td>582</td>\n",
" <td>0.238832</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>945</td>\n",
" <td>369</td>\n",
" <td>1314</td>\n",
" <td>0.280822</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All ratio_deces\n",
"Smoker \n",
"No 502 230 732 0.314208\n",
"Yes 443 139 582 0.238832\n",
"All 945 369 1314 0.280822"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table2['ratio_deces'] = table2['Dead']/table2['All']\n",
"table2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On observe donc que les fumeuses ont eu un taux de décès plus faible !\n",
"\n",
"Le problème avec notre étude, c'est que l'âge (qui est clairement un facteur dans la mort des individus) des participants n'a pas été pris en compte, or le fait de fumer (ou non) pour une femme dans les années 70 est corrélé avec l'âge et nos groupes de sont pas équivalent du point de vue de l'âge. On peut facilement le vérifier ici :"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>40.347410</td>\n",
" <td>70.481739</td>\n",
" <td>49.815847</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>39.648984</td>\n",
" <td>58.996403</td>\n",
" <td>44.269759</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>40.020000</td>\n",
" <td>66.155285</td>\n",
" <td>47.359361</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All\n",
"Smoker \n",
"No 40.347410 70.481739 49.815847\n",
"Yes 39.648984 58.996403 44.269759\n",
"All 40.020000 66.155285 47.359361"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"donnees.pivot_table(index = 'Smoker',values='Age', columns='Status', aggfunc='mean', margins=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ajoutons des tranches d'ages pour prendre en compte cette composante."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"def classifie(x):\n",
" if x <= 34:\n",
" return '18-34 ans'\n",
" elif x <= 54:\n",
" return '35-54 ans'\n",
" elif x <= 65:\n",
" return '55-65 ans'\n",
" else:\n",
" return '>65 ans'\n",
" \n",
"donnees['tranche']=donnees['Age'].apply(classifie)"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Smoker</th>\n",
" <th>Status</th>\n",
" <th>Age</th>\n",
" <th>tranche</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>21.0</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>19.3</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>47.1</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>81.4</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.8</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>23.8</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>57.5</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>24.8</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.5</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>30.0</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>66.0</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>49.2</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.4</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>60.6</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.1</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>43.5</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>27.1</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>58.3</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>65.7</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>73.2</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>33.4</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.3</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>18.0</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>56.2</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>59.2</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>25.8</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>36.9</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.2</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1284</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>36.0</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1285</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>48.3</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1286</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.1</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1287</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>60.8</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1288</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>39.3</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1289</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>36.7</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1290</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.8</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1291</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.3</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1292</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>57.7</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1293</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>63.2</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1294</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>46.6</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1295</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>82.4</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1296</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>38.3</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1297</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>32.7</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1298</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.7</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1299</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>60.0</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1300</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>71.0</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1301</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>20.5</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1302</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>44.4</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1303</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>31.2</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1304</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>47.8</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1305</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>60.9</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1306</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>61.4</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1307</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>43.0</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1308</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>42.1</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1309</th>\n",
" <td>Yes</td>\n",
" <td>Alive</td>\n",
" <td>35.9</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1310</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>22.3</td>\n",
" <td>18-34 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1311</th>\n",
" <td>Yes</td>\n",
" <td>Dead</td>\n",
" <td>62.1</td>\n",
" <td>55-65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1312</th>\n",
" <td>No</td>\n",
" <td>Dead</td>\n",
" <td>88.6</td>\n",
" <td>&gt;65 ans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1313</th>\n",
" <td>No</td>\n",
" <td>Alive</td>\n",
" <td>39.1</td>\n",
" <td>35-54 ans</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1314 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" Smoker Status Age tranche\n",
"0 Yes Alive 21.0 18-34 ans\n",
"1 Yes Alive 19.3 18-34 ans\n",
"2 No Dead 57.5 55-65 ans\n",
"3 No Alive 47.1 35-54 ans\n",
"4 Yes Alive 81.4 >65 ans\n",
"5 No Alive 36.8 35-54 ans\n",
"6 No Alive 23.8 18-34 ans\n",
"7 Yes Dead 57.5 55-65 ans\n",
"8 Yes Alive 24.8 18-34 ans\n",
"9 Yes Alive 49.5 35-54 ans\n",
"10 Yes Alive 30.0 18-34 ans\n",
"11 No Dead 66.0 >65 ans\n",
"12 Yes Alive 49.2 35-54 ans\n",
"13 No Alive 58.4 55-65 ans\n",
"14 No Dead 60.6 55-65 ans\n",
"15 No Alive 25.1 18-34 ans\n",
"16 No Alive 43.5 35-54 ans\n",
"17 No Alive 27.1 18-34 ans\n",
"18 No Alive 58.3 55-65 ans\n",
"19 Yes Alive 65.7 >65 ans\n",
"20 No Dead 73.2 >65 ans\n",
"21 Yes Alive 38.3 35-54 ans\n",
"22 No Alive 33.4 18-34 ans\n",
"23 Yes Dead 62.3 55-65 ans\n",
"24 No Alive 18.0 18-34 ans\n",
"25 No Alive 56.2 55-65 ans\n",
"26 Yes Alive 59.2 55-65 ans\n",
"27 No Alive 25.8 18-34 ans\n",
"28 No Dead 36.9 35-54 ans\n",
"29 No Alive 20.2 18-34 ans\n",
"... ... ... ... ...\n",
"1284 Yes Dead 36.0 35-54 ans\n",
"1285 Yes Alive 48.3 35-54 ans\n",
"1286 No Alive 63.1 55-65 ans\n",
"1287 No Alive 60.8 55-65 ans\n",
"1288 Yes Dead 39.3 35-54 ans\n",
"1289 No Alive 36.7 35-54 ans\n",
"1290 No Alive 63.8 55-65 ans\n",
"1291 No Dead 71.3 >65 ans\n",
"1292 No Alive 57.7 55-65 ans\n",
"1293 No Alive 63.2 55-65 ans\n",
"1294 No Alive 46.6 35-54 ans\n",
"1295 Yes Dead 82.4 >65 ans\n",
"1296 Yes Alive 38.3 35-54 ans\n",
"1297 Yes Alive 32.7 18-34 ans\n",
"1298 No Alive 39.7 35-54 ans\n",
"1299 Yes Dead 60.0 55-65 ans\n",
"1300 No Dead 71.0 >65 ans\n",
"1301 No Alive 20.5 18-34 ans\n",
"1302 No Alive 44.4 35-54 ans\n",
"1303 Yes Alive 31.2 18-34 ans\n",
"1304 Yes Alive 47.8 35-54 ans\n",
"1305 Yes Alive 60.9 55-65 ans\n",
"1306 No Dead 61.4 55-65 ans\n",
"1307 Yes Alive 43.0 35-54 ans\n",
"1308 No Alive 42.1 35-54 ans\n",
"1309 Yes Alive 35.9 35-54 ans\n",
"1310 No Alive 22.3 18-34 ans\n",
"1311 Yes Dead 62.1 55-65 ans\n",
"1312 No Dead 88.6 >65 ans\n",
"1313 No Alive 39.1 35-54 ans\n",
"\n",
"[1314 rows x 4 columns]"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"donnees"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" <th>ratio_deces</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>221</td>\n",
" <td>6</td>\n",
" <td>227</td>\n",
" <td>0.026432</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>182</td>\n",
" <td>7</td>\n",
" <td>189</td>\n",
" <td>0.037037</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>403</td>\n",
" <td>13</td>\n",
" <td>416</td>\n",
" <td>0.031250</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All ratio_deces\n",
"Smoker \n",
"No 221 6 227 0.026432\n",
"Yes 182 7 189 0.037037\n",
"All 403 13 416 0.031250"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table18_34 = pd.pivot_table(donnees[donnees['Age']<35],index = 'Smoker',values='Age', columns='Status', aggfunc='count', margins=True)\n",
"table18_34['ratio_deces'] = table18_34['Dead']/table18_34['All']\n",
"table18_34"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" <th>ratio_deces</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>172</td>\n",
" <td>19</td>\n",
" <td>191</td>\n",
" <td>0.099476</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>190</td>\n",
" <td>39</td>\n",
" <td>229</td>\n",
" <td>0.170306</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>362</td>\n",
" <td>58</td>\n",
" <td>420</td>\n",
" <td>0.138095</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All ratio_deces\n",
"Smoker \n",
"No 172 19 191 0.099476\n",
"Yes 190 39 229 0.170306\n",
"All 362 58 420 0.138095"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table35_54 = pd.pivot_table(donnees[(donnees['Age']>=35) & (donnees['Age']<55)],index = 'Smoker',values='Age', columns='Status', aggfunc='count', margins=True)\n",
"table35_54['ratio_deces'] = table35_54['Dead']/table35_54['All']\n",
"table35_54"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" <th>ratio_deces</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>82</td>\n",
" <td>49</td>\n",
" <td>131</td>\n",
" <td>0.374046</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>65</td>\n",
" <td>53</td>\n",
" <td>118</td>\n",
" <td>0.449153</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>147</td>\n",
" <td>102</td>\n",
" <td>249</td>\n",
" <td>0.409639</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All ratio_deces\n",
"Smoker \n",
"No 82 49 131 0.374046\n",
"Yes 65 53 118 0.449153\n",
"All 147 102 249 0.409639"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table55_65 = pd.pivot_table(donnees[(donnees['Age']>=55) & (donnees['Age']<66)],index = 'Smoker',values='Age', columns='Status', aggfunc='count', margins=True)\n",
"table55_65['ratio_deces'] = table55_65['Dead']/table55_65['All']\n",
"table55_65"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Status</th>\n",
" <th>Alive</th>\n",
" <th>Dead</th>\n",
" <th>All</th>\n",
" <th>ratio_deces</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Smoker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>No</th>\n",
" <td>27</td>\n",
" <td>165</td>\n",
" <td>192</td>\n",
" <td>0.859375</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yes</th>\n",
" <td>7</td>\n",
" <td>42</td>\n",
" <td>49</td>\n",
" <td>0.857143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>All</th>\n",
" <td>34</td>\n",
" <td>207</td>\n",
" <td>241</td>\n",
" <td>0.858921</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Status Alive Dead All ratio_deces\n",
"Smoker \n",
"No 27 165 192 0.859375\n",
"Yes 7 42 49 0.857143\n",
"All 34 207 241 0.858921"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table65plus = pd.pivot_table(donnees[(donnees['Age']>65)],index = 'Smoker',values='Age', columns='Status', aggfunc='count', margins=True)\n",
"table65plus['ratio_deces'] = table65plus['Dead']/table65plus['All']\n",
"table65plus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On remarque que dans chaque tranche d'age, les fumeuses ont un taux de décès plus fort. La tendance s'inverse quand on ne regarde plus l'âge car dans nos données les fumeuses sont plutôt plus jeunes que les non fumeuses."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f7ef506e5c0>"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x1080 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"copy = donnees.copy()\n",
"copy['Smoker']=donnees['Smoker'].replace({'Yes':1, 'No':0})\n",
"copy.plot.scatter(x='Smoker', y='Age', figsize=(10,15),s=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le graphe si dessus n'est pas si parlant, on peut sinon simplement calculer la proportion de fumeuses par tranche d'age et voir que cette proportion s'effondre pour les personnes de plus de 65 ans."
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tranche\n",
"18-34 ans 0.452500\n",
"35-54 ans 0.543578\n",
"55-65 ans 0.485232\n",
">65 ans 0.203320\n",
"Name: Smoker, dtype: float64"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"copy.groupby('tranche')['Smoker'].mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_code_all_hidden": true,
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": "Python 3",
"language": "python", "language": "python",
...@@ -16,10 +2252,9 @@ ...@@ -16,10 +2252,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.3" "version": "3.6.4"
} }
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment