sujut6

parent f983203d
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Titre du document "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2+2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n"
]
}
],
"source": [
"x=10\n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20\n"
]
}
],
"source": [
"x=x+10\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Petit exemple de completion"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.4.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
{ {
"cells": [], "cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sujet 6 : Autour du Paradoxe de Simpson"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taux de mortalité "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"data_url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4\n"
]
}
],
"source": [
"df = pd.read_csv (data_url)\n",
"print(df.head())"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Status Alive Dead Total Mortality Rate\n",
"Smoker \n",
"No 502 230 732 0.314208\n",
"Yes 443 139 582 0.238832\n"
]
}
],
"source": [
"summary_table = df.groupby(['Smoker', 'Status']).size().unstack(fill_value=0)\n",
"\n",
"summary_table['Total'] = summary_table.sum(axis=1)\n",
"summary_table['Mortality Rate'] = summary_table['Dead'] / summary_table['Total']\n",
"\n",
"print(summary_table)\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"summary_table.reset_index(inplace=True)\n",
"\n",
"sns.barplot(x='Smoker', y='Mortality Rate', data=summary_table)\n",
"plt.title('Mortality rates for smokers and non-smokers')\n",
"plt.ylabel('Mortality Rate')\n",
"plt.xlabel('Smoking Status')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The graph shows mortality rates for smokers and non-smokers. Mortality appears to be higher in non-smokers than in smokers. This goes against the conventional wisdom that smoking is generally associated with an increased risk of mortality. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taux de mortalité par tranches d'âge"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Status Alive Dead Total Mortality Rate\n",
"Age Group Smoker \n",
"18-34 No 213 6 219 0.027397\n",
" Yes 174 5 179 0.027933\n",
"35-54 No 180 19 199 0.095477\n",
" Yes 198 41 239 0.171548\n",
"55-64 No 80 39 119 0.327731\n",
" Yes 64 51 115 0.443478\n",
"65+ No 29 166 195 0.851282\n",
" Yes 7 42 49 0.857143\n"
]
}
],
"source": [
"\n",
"age_bins = [18, 34, 54, 64, 100] \n",
"age_labels = ['18-34', '35-54', '55-64', '65+']\n",
"df['Age Group'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels, right=False)\n",
"\n",
"\n",
"age_smoking_table = df.groupby(['Age Group', 'Smoker', 'Status']).size().unstack(fill_value=0)\n",
"\n",
"age_smoking_table['Total'] = age_smoking_table.sum(axis=1)\n",
"age_smoking_table['Mortality Rate'] = age_smoking_table['Dead'] / age_smoking_table['Total']\n",
"\n",
"\n",
"print(age_smoking_table)\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"age_smoking_table.reset_index(inplace=True)\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"sns.barplot(x='Age Group', y='Mortality Rate', hue='Smoker', data=age_smoking_table)\n",
"plt.title('Mortality rates by age group and smoking status')\n",
"plt.ylabel('Mortality Rate')\n",
"plt.xlabel('Age Group')\n",
"plt.legend(title='Smoking Status')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This graph shows that mortality rates increase with age, for both smokers and non-smokers. Smokers have slightly higher mortality rates in the intermediate age groups (35-64). However, in the 65+ age group, mortality rates are almost identical for smokers and non-smokers, which may be explained by other causes of death. The impact of smoking on mortality appears to be more marked in the middle-aged group, but less so in the elderly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Régression logistique"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.381244\n",
" Iterations 7\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Death No. Observations: 1314\n",
"Model: Logit Df Residuals: 1311\n",
"Method: MLE Df Model: 2\n",
"Date: Sat, 12 Oct 2024 Pseudo R-squ.: 0.3579\n",
"Time: 13:47:48 Log-Likelihood: -500.95\n",
"converged: True LL-Null: -780.16\n",
" LLR p-value: 5.534e-122\n",
"==============================================================================\n",
" coef std err z P>|z| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"const -6.3519 0.360 -17.637 0.000 -7.058 -5.646\n",
"Age 0.0998 0.006 17.290 0.000 0.089 0.111\n",
"Smoking 0.2787 0.165 1.689 0.091 -0.045 0.602\n",
"==============================================================================\n"
]
}
],
"source": [
"import pandas as pd\n",
"import statsmodels.api as sm\n",
"\n",
"url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\"\n",
"df = pd.read_csv(url)\n",
"\n",
"df['Death'] = df['Status'].apply(lambda x: 1 if x == 'Dead' else 0) \n",
"df['Smoking'] = df['Smoker'].apply(lambda x: 1 if x == 'Yes' else 0) \n",
"\n",
"X = sm.add_constant(df[['Age', 'Smoking']]) \n",
"y = df['Death'] \n",
"\n",
"model = sm.Logit(y, X)\n",
"result = model.fit()\n",
"\n",
"print(result.summary())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": "Python 3",
...@@ -16,10 +288,9 @@ ...@@ -16,10 +288,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.3" "version": "3.6.4"
} }
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }
{
"cells": [],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sujet 6 : Autour du Paradoxe de Simpson"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taux de mortalité "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"data_url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Smoker Status Age\n",
"0 Yes Alive 21.0\n",
"1 Yes Alive 19.3\n",
"2 No Dead 57.5\n",
"3 No Alive 47.1\n",
"4 Yes Alive 81.4\n"
]
}
],
"source": [
"df = pd.read_csv (data_url)\n",
"print(df.head())"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Status Alive Dead Total Mortality Rate\n",
"Smoker \n",
"No 502 230 732 0.314208\n",
"Yes 443 139 582 0.238832\n"
]
}
],
"source": [
"summary_table = df.groupby(['Smoker', 'Status']).size().unstack(fill_value=0)\n",
"\n",
"summary_table['Total'] = summary_table.sum(axis=1)\n",
"summary_table['Mortality Rate'] = summary_table['Dead'] / summary_table['Total']\n",
"\n",
"print(summary_table)\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"summary_table.reset_index(inplace=True)\n",
"\n",
"sns.barplot(x='Smoker', y='Mortality Rate', data=summary_table)\n",
"plt.title('Mortality rates for smokers and non-smokers')\n",
"plt.ylabel('Mortality Rate')\n",
"plt.xlabel('Smoking Status')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le graphique montre les taux de mortalité pour les fumeurs et les non-fumeurs. La mortalité semble être plus élevée chez les non-fumeurs que chez les fumeurs. Cela va à l'encontre de l'idée reçue que le tabagisme est généralement associé à un risque accru de mortalité. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taux de mortalité par tranches d'âge"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Status Alive Dead Total Mortality Rate\n",
"Age Group Smoker \n",
"18-34 No 213 6 219 0.027397\n",
" Yes 174 5 179 0.027933\n",
"35-54 No 180 19 199 0.095477\n",
" Yes 198 41 239 0.171548\n",
"55-64 No 80 39 119 0.327731\n",
" Yes 64 51 115 0.443478\n",
"65+ No 29 166 195 0.851282\n",
" Yes 7 42 49 0.857143\n"
]
}
],
"source": [
"\n",
"age_bins = [18, 34, 54, 64, 100] \n",
"age_labels = ['18-34', '35-54', '55-64', '65+']\n",
"df['Age Group'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels, right=False)\n",
"\n",
"\n",
"age_smoking_table = df.groupby(['c', 'Smoker', 'Status']).size().unstack(fill_value=0)\n",
"\n",
"age_smoking_table['Total'] = age_smoking_table.sum(axis=1)\n",
"age_smoking_table['Mortality Rate'] = age_smoking_table['Dead'] / age_smoking_table['Total']\n",
"\n",
"\n",
"print(age_smoking_table)\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"age_smoking_table.reset_index(inplace=True)\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"sns.barplot(x='Age Group', y='Mortality Rate', hue='Smoker', data=age_smoking_table)\n",
"plt.title('Mortality rates by age group and smoking status')\n",
"plt.ylabel('Mortality Rate')\n",
"plt.xlabel('Age Group')\n",
"plt.legend(title='Smoking Status')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ce graphique montre que les taux de mortalité augmentent avec l'âge, pour les fumeurs comme pour les non-fumeurs. Les fumeurs ont des taux de mortalité légèrement plus élevés dans les groupes d'âge intermédiaires (35-64 ans). Cependant, dans le groupe des 65 ans et plus, les taux de mortalité sont presque identiques pour les fumeurs et les non-fumeurs, ce qui peut s'expliquer par d'autres causes de décès. L'impact du tabagisme sur la mortalité semble plus marqué chez les personnes d'âge moyen, mais diminue chez les personnes âgée"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Régression logistique"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.381244\n",
" Iterations 7\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Death No. Observations: 1314\n",
"Model: Logit Df Residuals: 1311\n",
"Method: MLE Df Model: 2\n",
"Date: Sat, 12 Oct 2024 Pseudo R-squ.: 0.3579\n",
"Time: 09:18:51 Log-Likelihood: -500.95\n",
"converged: True LL-Null: -780.16\n",
" LLR p-value: 5.534e-122\n",
"==============================================================================\n",
" coef std err z P>|z| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"const -6.3519 0.360 -17.637 0.000 -7.058 -5.646\n",
"Age 0.0998 0.006 17.290 0.000 0.089 0.111\n",
"Smoking 0.2787 0.165 1.689 0.091 -0.045 0.602\n",
"==============================================================================\n"
]
}
],
"source": [
"import pandas as pd\n",
"import statsmodels.api as sm\n",
"\n",
"url = \"https://gitlab.inria.fr/learninglab/mooc-rr/mooc-rr-ressources/-/raw/master/module3/Practical_session/Subject6_smoking.csv\"\n",
"df = pd.read_csv(url)\n",
"\n",
"df['Death'] = df['Status'].apply(lambda x: 1 if x == 'Dead' else 0) \n",
"df['Smoking'] = df['Smoker'].apply(lambda x: 1 if x == 'Yes' else 0) \n",
"\n",
"X = sm.add_constant(df[['Age', 'Smoking']]) \n",
"y = df['Death'] \n",
"\n",
"model = sm.Logit(y, X)\n",
"result = model.fit()\n",
"\n",
"print(result.summary())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment