diff --git a/module3/exo3/exercice.ipynb b/module3/exo3/exercice.ipynb
index f3d9246d23866386e735390bdaa871ad0e1328f5..c0147b27790e605c3dc84131af62465930489cc4 100644
--- a/module3/exo3/exercice.ipynb
+++ b/module3/exo3/exercice.ipynb
@@ -967,19 +967,315 @@
"## Analyser les données"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Classons les personnages selon la quantité de parole grâce à une analyse syntaxique du texte (scènes / répliques / mots). En particulier, quel est celui qui parle le plus ? Quel est celui qui ne parle pas du tout ?"
+ ]
+ },
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 98,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "HARPAGON 22\n",
+ "FROSINE 14\n",
+ "CLEANTE 14\n",
+ "ELISE 13\n",
+ "MARIANE 11\n",
+ "MAITRE JACQUES 8\n",
+ "VALERE 8\n",
+ "LA FLECHE 5\n",
+ "LE COMMISSAIRE 5\n",
+ "SON CLERC 5\n",
+ "BRINDAVOINE 2\n",
+ "LA MERLUCHE 2\n",
+ "DAME CLAUDE 1\n",
+ "MAITRE SIMON 1\n",
+ "ANSELME 1\n",
+ "Name: personnage, dtype: int64"
+ ]
+ },
+ "execution_count": 98,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.personnage.value_counts()"
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 103,
"metadata": {},
- "outputs": [],
- "source": []
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " nombre_de_mots | \n",
+ "
\n",
+ " \n",
+ " personnage | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " DAME CLAUDE | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " MAITRE JACQUES | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " MAITRE SIMON | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " SON CLERC | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " BRINDAVOINE | \n",
+ " 38 | \n",
+ "
\n",
+ " \n",
+ " LA MERLUCHE | \n",
+ " 49 | \n",
+ "
\n",
+ " \n",
+ " LE COMMISSAIRE | \n",
+ " 258 | \n",
+ "
\n",
+ " \n",
+ " ANSELME | \n",
+ " 383 | \n",
+ "
\n",
+ " \n",
+ " MARIANE | \n",
+ " 819 | \n",
+ "
\n",
+ " \n",
+ " ELISE | \n",
+ " 893 | \n",
+ "
\n",
+ " \n",
+ " LA FLECHE | \n",
+ " 1419 | \n",
+ "
\n",
+ " \n",
+ " FROSINE | \n",
+ " 2033 | \n",
+ "
\n",
+ " \n",
+ " VALERE | \n",
+ " 2532 | \n",
+ "
\n",
+ " \n",
+ " CLEANTE | \n",
+ " 3046 | \n",
+ "
\n",
+ " \n",
+ " HARPAGON | \n",
+ " 5092 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " nombre_de_mots\n",
+ "personnage \n",
+ "DAME CLAUDE 0\n",
+ "MAITRE JACQUES 0\n",
+ "MAITRE SIMON 0\n",
+ "SON CLERC 0\n",
+ "BRINDAVOINE 38\n",
+ "LA MERLUCHE 49\n",
+ "LE COMMISSAIRE 258\n",
+ "ANSELME 383\n",
+ "MARIANE 819\n",
+ "ELISE 893\n",
+ "LA FLECHE 1419\n",
+ "FROSINE 2033\n",
+ "VALERE 2532\n",
+ "CLEANTE 3046\n",
+ "HARPAGON 5092"
+ ]
+ },
+ "execution_count": 103,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df[['nombre_de_mots','personnage']].groupby('personnage').sum().sort_values(by=['nombre_de_mots'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 104,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " nombre_de_repliques | \n",
+ "
\n",
+ " \n",
+ " personnage | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " DAME CLAUDE | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " MAITRE JACQUES | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " MAITRE SIMON | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " SON CLERC | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " BRINDAVOINE | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " LA MERLUCHE | \n",
+ " 5 | \n",
+ "
\n",
+ " \n",
+ " ANSELME | \n",
+ " 14 | \n",
+ "
\n",
+ " \n",
+ " LE COMMISSAIRE | \n",
+ " 15 | \n",
+ "
\n",
+ " \n",
+ " MARIANE | \n",
+ " 26 | \n",
+ "
\n",
+ " \n",
+ " ELISE | \n",
+ " 50 | \n",
+ "
\n",
+ " \n",
+ " FROSINE | \n",
+ " 59 | \n",
+ "
\n",
+ " \n",
+ " LA FLECHE | \n",
+ " 64 | \n",
+ "
\n",
+ " \n",
+ " VALERE | \n",
+ " 99 | \n",
+ "
\n",
+ " \n",
+ " CLEANTE | \n",
+ " 156 | \n",
+ "
\n",
+ " \n",
+ " HARPAGON | \n",
+ " 334 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " nombre_de_repliques\n",
+ "personnage \n",
+ "DAME CLAUDE 0\n",
+ "MAITRE JACQUES 0\n",
+ "MAITRE SIMON 0\n",
+ "SON CLERC 0\n",
+ "BRINDAVOINE 3\n",
+ "LA MERLUCHE 5\n",
+ "ANSELME 14\n",
+ "LE COMMISSAIRE 15\n",
+ "MARIANE 26\n",
+ "ELISE 50\n",
+ "FROSINE 59\n",
+ "LA FLECHE 64\n",
+ "VALERE 99\n",
+ "CLEANTE 156\n",
+ "HARPAGON 334"
+ ]
+ },
+ "execution_count": 104,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df[['nombre_de_repliques','personnage']].groupby('personnage').sum().sort_values(by=['nombre_de_repliques'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "On voit dans ces analyses qu'Harpagon participe au plus grand nombre de scènes (22 sur 31). En terme de nombre de mots parlés, Harpagon est aussi celui qui parle le plus avec Dame Claude, Maitre Jacques, Maitre Simon et le clerc qui ne parlent pas du tout. C'est aussi Harpagon qui a le plus grand nombre de répliques."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Réalisez un graphique qui montrera le nombre de mots que chaque acteur prononce dans chaque scène. Pour cela, vous pouvez vous inspirer de l'étude de l'Avare de Molière réalisée par l'OBVIL (graphe de gauche). Dans ce graphique, les lignes sont de longueur égale et la hauteur représente le nombre de mots prononcés au total dans la scène. La largeur de chaque rectangle indique le pourcentage de la scène qu’un acteur occupe."
+ ]
}
],
"metadata": {