Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
M
mooc-rr
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
7f9d4a2f9f536fc2da1beb7df3382bb3
mooc-rr
Commits
758aa33b
Commit
758aa33b
authored
Dec 19, 2025
by
7f9d4a2f9f536fc2da1beb7df3382bb3
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add computational document on Simpson paradox
parent
e391755a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
314 additions
and
3 deletions
+314
-3
exercice_en.ipynb
module3/exo3/exercice_en.ipynb
+314
-3
No files found.
module3/exo3/exercice_en.ipynb
View file @
758aa33b
{
"cells": [],
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Autour du paradoxe de Simpson\n",
"\n",
"## Objectif\n",
"Le paradoxe de Simpson décrit une situation statistique dans laquelle une tendance observée\n",
"dans plusieurs sous-groupes disparaît ou s’inverse lorsque les données sont agrégées.\n",
"\n",
"L’objectif de ce document est :\n",
"- d’illustrer le paradoxe sur un jeu de données simple,\n",
"- de visualiser les tendances par sous-groupes et globalement,\n",
"- de discuter les implications pour l’analyse de données et la reproductibilité.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Groupe</th>\n",
" <th>Succès</th>\n",
" <th>Total</th>\n",
" <th>Traitement</th>\n",
" <th>Taux_succès</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jeunes</td>\n",
" <td>90</td>\n",
" <td>100</td>\n",
" <td>A</td>\n",
" <td>0.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Âgés</td>\n",
" <td>10</td>\n",
" <td>100</td>\n",
" <td>A</td>\n",
" <td>0.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Jeunes</td>\n",
" <td>80</td>\n",
" <td>100</td>\n",
" <td>B</td>\n",
" <td>0.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Âgés</td>\n",
" <td>20</td>\n",
" <td>100</td>\n",
" <td>B</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Groupe Succès Total Traitement Taux_succès\n",
"0 Jeunes 90 100 A 0.9\n",
"1 Âgés 10 100 A 0.1\n",
"2 Jeunes 80 100 B 0.8\n",
"3 Âgés 20 100 B 0.2"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Exemple classique du paradoxe de Simpson\n",
"data = pd.DataFrame({\n",
" \"Traitement\": [\"A\", \"A\", \"B\", \"B\"],\n",
" \"Groupe\": [\"Jeunes\", \"Âgés\", \"Jeunes\", \"Âgés\"],\n",
" \"Succès\": [90, 10, 80, 20],\n",
" \"Total\": [100, 100, 100, 100]\n",
"})\n",
"\n",
"data[\"Taux_succès\"] = data[\"Succès\"] / data[\"Total\"]\n",
"data\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Traitement</th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Groupe</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Jeunes</th>\n",
" <td>0.9</td>\n",
" <td>0.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Âgés</th>\n",
" <td>0.1</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Traitement A B\n",
"Groupe \n",
"Jeunes 0.9 0.8\n",
"Âgés 0.1 0.2"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pivot = data.pivot(index=\"Groupe\", columns=\"Traitement\", values=\"Taux_succès\")\n",
"pivot\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Succès</th>\n",
" <th>Total</th>\n",
" <th>Taux_succès</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Traitement</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>A</th>\n",
" <td>100</td>\n",
" <td>200</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>100</td>\n",
" <td>200</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Succès Total Taux_succès\n",
"Traitement \n",
"A 100 200 0.5\n",
"B 100 200 0.5"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"global_data = data.groupby(\"Traitement\")[[\"Succès\", \"Total\"]].sum()\n",
"global_data[\"Taux_succès\"] = global_data[\"Succès\"] / global_data[\"Total\"]\n",
"global_data\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Discussion\n",
"\n",
"Ce paradoxe provient du fait que les groupes ne sont pas répartis de manière équilibrée\n",
"entre les traitements. Le poids relatif des sous-groupes influence fortement le résultat\n",
"agrégé.\n",
"\n",
"Ce phénomène souligne l’importance :\n",
"- de stratifier les données avant analyse,\n",
"- de comprendre les variables de confusion,\n",
"- de ne pas se fier uniquement aux statistiques globales.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reproductibilité\n",
"\n",
"- Les données sont intégrées directement dans le code.\n",
"- Les calculs sont déterministes (pas d’aléatoire).\n",
"- Les bibliothèques utilisées sont standards (pandas, matplotlib).\n",
"- Le document peut être réexécuté intégralement sur une autre machine.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"Le paradoxe de Simpson montre que des conclusions opposées peuvent être tirées\n",
"selon le niveau d’agrégation des données.\n",
"\n",
"Il rappelle que l’analyse de données nécessite :\n",
"- une compréhension fine du contexte,\n",
"- une exploration multi-niveaux,\n",
"- une grande prudence dans l’interprétation des résultats.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
...
...
@@ -16,10 +328,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.
3
"
"version": "3.6.
4
"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment