diff --git a/module3/exo3/exercice.ipynb b/module3/exo3/exercice.ipynb index f3d9246d23866386e735390bdaa871ad0e1328f5..c0147b27790e605c3dc84131af62465930489cc4 100644 --- a/module3/exo3/exercice.ipynb +++ b/module3/exo3/exercice.ipynb @@ -967,19 +967,315 @@ "## Analyser les données" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Classons les personnages selon la quantité de parole grâce à une analyse syntaxique du texte (scènes / répliques / mots). En particulier, quel est celui qui parle le plus ? Quel est celui qui ne parle pas du tout ?" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 98, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "HARPAGON 22\n", + "FROSINE 14\n", + "CLEANTE 14\n", + "ELISE 13\n", + "MARIANE 11\n", + "MAITRE JACQUES 8\n", + "VALERE 8\n", + "LA FLECHE 5\n", + "LE COMMISSAIRE 5\n", + "SON CLERC 5\n", + "BRINDAVOINE 2\n", + "LA MERLUCHE 2\n", + "DAME CLAUDE 1\n", + "MAITRE SIMON 1\n", + "ANSELME 1\n", + "Name: personnage, dtype: int64" + ] + }, + "execution_count": 98, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.personnage.value_counts()" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 103, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nombre_de_mots
personnage
DAME CLAUDE0
MAITRE JACQUES0
MAITRE SIMON0
SON CLERC0
BRINDAVOINE38
LA MERLUCHE49
LE COMMISSAIRE258
ANSELME383
MARIANE819
ELISE893
LA FLECHE1419
FROSINE2033
VALERE2532
CLEANTE3046
HARPAGON5092
\n", + "
" + ], + "text/plain": [ + " nombre_de_mots\n", + "personnage \n", + "DAME CLAUDE 0\n", + "MAITRE JACQUES 0\n", + "MAITRE SIMON 0\n", + "SON CLERC 0\n", + "BRINDAVOINE 38\n", + "LA MERLUCHE 49\n", + "LE COMMISSAIRE 258\n", + "ANSELME 383\n", + "MARIANE 819\n", + "ELISE 893\n", + "LA FLECHE 1419\n", + "FROSINE 2033\n", + "VALERE 2532\n", + "CLEANTE 3046\n", + "HARPAGON 5092" + ] + }, + "execution_count": 103, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['nombre_de_mots','personnage']].groupby('personnage').sum().sort_values(by=['nombre_de_mots'])" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nombre_de_repliques
personnage
DAME CLAUDE0
MAITRE JACQUES0
MAITRE SIMON0
SON CLERC0
BRINDAVOINE3
LA MERLUCHE5
ANSELME14
LE COMMISSAIRE15
MARIANE26
ELISE50
FROSINE59
LA FLECHE64
VALERE99
CLEANTE156
HARPAGON334
\n", + "
" + ], + "text/plain": [ + " nombre_de_repliques\n", + "personnage \n", + "DAME CLAUDE 0\n", + "MAITRE JACQUES 0\n", + "MAITRE SIMON 0\n", + "SON CLERC 0\n", + "BRINDAVOINE 3\n", + "LA MERLUCHE 5\n", + "ANSELME 14\n", + "LE COMMISSAIRE 15\n", + "MARIANE 26\n", + "ELISE 50\n", + "FROSINE 59\n", + "LA FLECHE 64\n", + "VALERE 99\n", + "CLEANTE 156\n", + "HARPAGON 334" + ] + }, + "execution_count": 104, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['nombre_de_repliques','personnage']].groupby('personnage').sum().sort_values(by=['nombre_de_repliques'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "On voit dans ces analyses qu'Harpagon participe au plus grand nombre de scènes (22 sur 31). En terme de nombre de mots parlés, Harpagon est aussi celui qui parle le plus avec Dame Claude, Maitre Jacques, Maitre Simon et le clerc qui ne parlent pas du tout. C'est aussi Harpagon qui a le plus grand nombre de répliques." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Réalisez un graphique qui montrera le nombre de mots que chaque acteur prononce dans chaque scène. Pour cela, vous pouvez vous inspirer de l'étude de l'Avare de Molière réalisée par l'OBVIL (graphe de gauche). Dans ce graphique, les lignes sont de longueur égale et la hauteur représente le nombre de mots prononcés au total dans la scène. La largeur de chaque rectangle indique le pourcentage de la scène qu’un acteur occupe." + ] } ], "metadata": {