question 1 complete

parent 05a1911e
......@@ -967,19 +967,315 @@
"## Analyser les données"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Classons les personnages selon la quantité de parole grâce à une analyse syntaxique du texte (scènes / répliques / mots). En particulier, quel est celui qui parle le plus ? Quel est celui qui ne parle pas du tout ?"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"data": {
"text/plain": [
"HARPAGON 22\n",
"FROSINE 14\n",
"CLEANTE 14\n",
"ELISE 13\n",
"MARIANE 11\n",
"MAITRE JACQUES 8\n",
"VALERE 8\n",
"LA FLECHE 5\n",
"LE COMMISSAIRE 5\n",
"SON CLERC 5\n",
"BRINDAVOINE 2\n",
"LA MERLUCHE 2\n",
"DAME CLAUDE 1\n",
"MAITRE SIMON 1\n",
"ANSELME 1\n",
"Name: personnage, dtype: int64"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.personnage.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 103,
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>nombre_de_mots</th>\n",
" </tr>\n",
" <tr>\n",
" <th>personnage</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>DAME CLAUDE</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MAITRE JACQUES</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MAITRE SIMON</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SON CLERC</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>BRINDAVOINE</th>\n",
" <td>38</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LA MERLUCHE</th>\n",
" <td>49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LE COMMISSAIRE</th>\n",
" <td>258</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ANSELME</th>\n",
" <td>383</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MARIANE</th>\n",
" <td>819</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ELISE</th>\n",
" <td>893</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LA FLECHE</th>\n",
" <td>1419</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FROSINE</th>\n",
" <td>2033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>VALERE</th>\n",
" <td>2532</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CLEANTE</th>\n",
" <td>3046</td>\n",
" </tr>\n",
" <tr>\n",
" <th>HARPAGON</th>\n",
" <td>5092</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" nombre_de_mots\n",
"personnage \n",
"DAME CLAUDE 0\n",
"MAITRE JACQUES 0\n",
"MAITRE SIMON 0\n",
"SON CLERC 0\n",
"BRINDAVOINE 38\n",
"LA MERLUCHE 49\n",
"LE COMMISSAIRE 258\n",
"ANSELME 383\n",
"MARIANE 819\n",
"ELISE 893\n",
"LA FLECHE 1419\n",
"FROSINE 2033\n",
"VALERE 2532\n",
"CLEANTE 3046\n",
"HARPAGON 5092"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['nombre_de_mots','personnage']].groupby('personnage').sum().sort_values(by=['nombre_de_mots'])"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>nombre_de_repliques</th>\n",
" </tr>\n",
" <tr>\n",
" <th>personnage</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>DAME CLAUDE</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MAITRE JACQUES</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MAITRE SIMON</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SON CLERC</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>BRINDAVOINE</th>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LA MERLUCHE</th>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ANSELME</th>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LE COMMISSAIRE</th>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MARIANE</th>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ELISE</th>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FROSINE</th>\n",
" <td>59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LA FLECHE</th>\n",
" <td>64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>VALERE</th>\n",
" <td>99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CLEANTE</th>\n",
" <td>156</td>\n",
" </tr>\n",
" <tr>\n",
" <th>HARPAGON</th>\n",
" <td>334</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" nombre_de_repliques\n",
"personnage \n",
"DAME CLAUDE 0\n",
"MAITRE JACQUES 0\n",
"MAITRE SIMON 0\n",
"SON CLERC 0\n",
"BRINDAVOINE 3\n",
"LA MERLUCHE 5\n",
"ANSELME 14\n",
"LE COMMISSAIRE 15\n",
"MARIANE 26\n",
"ELISE 50\n",
"FROSINE 59\n",
"LA FLECHE 64\n",
"VALERE 99\n",
"CLEANTE 156\n",
"HARPAGON 334"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['nombre_de_repliques','personnage']].groupby('personnage').sum().sort_values(by=['nombre_de_repliques'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On voit dans ces analyses qu'Harpagon participe au plus grand nombre de scènes (22 sur 31). En terme de nombre de mots parlés, Harpagon est aussi celui qui parle le plus avec Dame Claude, Maitre Jacques, Maitre Simon et le clerc qui ne parlent pas du tout. C'est aussi Harpagon qui a le plus grand nombre de répliques."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Réalisez un graphique qui montrera le nombre de mots que chaque acteur prononce dans chaque scène. Pour cela, vous pouvez vous inspirer de l'étude de l'Avare de Molière réalisée par l'OBVIL (graphe de gauche). Dans ce graphique, les lignes sont de longueur égale et la hauteur représente le nombre de mots prononcés au total dans la scène. La largeur de chaque rectangle indique le pourcentage de la scène qu’un acteur occupe."
]
}
],
"metadata": {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment