From 758aa33b59651b170581f204067e54a744bff83b Mon Sep 17 00:00:00 2001
From: 7f9d4a2f9f536fc2da1beb7df3382bb3
<7f9d4a2f9f536fc2da1beb7df3382bb3@app-learninglab.inria.fr>
Date: Fri, 19 Dec 2025 19:42:13 +0000
Subject: [PATCH] Add computational document on Simpson paradox
---
module3/exo3/exercice_en.ipynb | 317 ++++++++++++++++++++++++++++++++-
1 file changed, 314 insertions(+), 3 deletions(-)
diff --git a/module3/exo3/exercice_en.ipynb b/module3/exo3/exercice_en.ipynb
index 0bbbe37..924d6ae 100644
--- a/module3/exo3/exercice_en.ipynb
+++ b/module3/exo3/exercice_en.ipynb
@@ -1,5 +1,317 @@
{
- "cells": [],
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Autour du paradoxe de Simpson\n",
+ "\n",
+ "## Objectif\n",
+ "Le paradoxe de Simpson décrit une situation statistique dans laquelle une tendance observée\n",
+ "dans plusieurs sous-groupes disparaît ou s’inverse lorsque les données sont agrégées.\n",
+ "\n",
+ "L’objectif de ce document est :\n",
+ "- d’illustrer le paradoxe sur un jeu de données simple,\n",
+ "- de visualiser les tendances par sous-groupes et globalement,\n",
+ "- de discuter les implications pour l’analyse de données et la reproductibilité.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Groupe | \n",
+ " Succès | \n",
+ " Total | \n",
+ " Traitement | \n",
+ " Taux_succès | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " Jeunes | \n",
+ " 90 | \n",
+ " 100 | \n",
+ " A | \n",
+ " 0.9 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " Âgés | \n",
+ " 10 | \n",
+ " 100 | \n",
+ " A | \n",
+ " 0.1 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " Jeunes | \n",
+ " 80 | \n",
+ " 100 | \n",
+ " B | \n",
+ " 0.8 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " Âgés | \n",
+ " 20 | \n",
+ " 100 | \n",
+ " B | \n",
+ " 0.2 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Groupe Succès Total Traitement Taux_succès\n",
+ "0 Jeunes 90 100 A 0.9\n",
+ "1 Âgés 10 100 A 0.1\n",
+ "2 Jeunes 80 100 B 0.8\n",
+ "3 Âgés 20 100 B 0.2"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Exemple classique du paradoxe de Simpson\n",
+ "data = pd.DataFrame({\n",
+ " \"Traitement\": [\"A\", \"A\", \"B\", \"B\"],\n",
+ " \"Groupe\": [\"Jeunes\", \"Âgés\", \"Jeunes\", \"Âgés\"],\n",
+ " \"Succès\": [90, 10, 80, 20],\n",
+ " \"Total\": [100, 100, 100, 100]\n",
+ "})\n",
+ "\n",
+ "data[\"Taux_succès\"] = data[\"Succès\"] / data[\"Total\"]\n",
+ "data\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | Traitement | \n",
+ " A | \n",
+ " B | \n",
+ "
\n",
+ " \n",
+ " | Groupe | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Jeunes | \n",
+ " 0.9 | \n",
+ " 0.8 | \n",
+ "
\n",
+ " \n",
+ " | Âgés | \n",
+ " 0.1 | \n",
+ " 0.2 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "Traitement A B\n",
+ "Groupe \n",
+ "Jeunes 0.9 0.8\n",
+ "Âgés 0.1 0.2"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pivot = data.pivot(index=\"Groupe\", columns=\"Traitement\", values=\"Taux_succès\")\n",
+ "pivot\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Succès | \n",
+ " Total | \n",
+ " Taux_succès | \n",
+ "
\n",
+ " \n",
+ " | Traitement | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | A | \n",
+ " 100 | \n",
+ " 200 | \n",
+ " 0.5 | \n",
+ "
\n",
+ " \n",
+ " | B | \n",
+ " 100 | \n",
+ " 200 | \n",
+ " 0.5 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Succès Total Taux_succès\n",
+ "Traitement \n",
+ "A 100 200 0.5\n",
+ "B 100 200 0.5"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "global_data = data.groupby(\"Traitement\")[[\"Succès\", \"Total\"]].sum()\n",
+ "global_data[\"Taux_succès\"] = global_data[\"Succès\"] / global_data[\"Total\"]\n",
+ "global_data\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Discussion\n",
+ "\n",
+ "Ce paradoxe provient du fait que les groupes ne sont pas répartis de manière équilibrée\n",
+ "entre les traitements. Le poids relatif des sous-groupes influence fortement le résultat\n",
+ "agrégé.\n",
+ "\n",
+ "Ce phénomène souligne l’importance :\n",
+ "- de stratifier les données avant analyse,\n",
+ "- de comprendre les variables de confusion,\n",
+ "- de ne pas se fier uniquement aux statistiques globales.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Reproductibilité\n",
+ "\n",
+ "- Les données sont intégrées directement dans le code.\n",
+ "- Les calculs sont déterministes (pas d’aléatoire).\n",
+ "- Les bibliothèques utilisées sont standards (pandas, matplotlib).\n",
+ "- Le document peut être réexécuté intégralement sur une autre machine.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "Le paradoxe de Simpson montre que des conclusions opposées peuvent être tirées\n",
+ "selon le niveau d’agrégation des données.\n",
+ "\n",
+ "Il rappelle que l’analyse de données nécessite :\n",
+ "- une compréhension fine du contexte,\n",
+ "- une exploration multi-niveaux,\n",
+ "- une grande prudence dans l’interprétation des résultats.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
@@ -16,10 +328,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.3"
+ "version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
-
--
2.18.1