{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyse des dialogues dans l'Avare de Molière" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import os \n", "import urllib.request\n", "import linecache\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On récupère le texte de \"l'Avare de Molière\" sous format markdown. Pour éviter d'éventuels problèmes, on fait une copie locale du fichier.Il est dangereux de télécharger les données à chaque exécution, car il est possible que nous les remplacions par un fichier défectueux dans le cas d'une panne. Pour cette raison, nous téléchargeons le fichier uniquement si la copie locale n'existe pas." ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [], "source": [ "texte_url = \"http://dramacode.github.io/markdown/moliere_avare.txt\"\n", "texte_file = \"moliere_avare.txt\"\n", "if not os.path.exists(texte_file):\n", " urllib.request.urlretrieve(texte_url,texte_file)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On cherche à classez les personnages selon la quantité de parole. Pour se faire, nous allons réaliser une analyse syntaxique du texte. Celui-ci est représenté de la façon suivante:\n", "*les 40 premières lignes nous donne les informations liée à la piéce ainsi que le nom et rôle de chaque personnage.\n", "*les actes sont représentés sous forme de Titre 2 (##).\n", "*les scénes sont représentés sous forme de Titre 3 (###).\n", "*Pour chaque réplique, le nom du personnage est écrit en majuscule et sa réplique est écrite en dessous.\n", "\n", "Pour commencer, nous allons regarder la quantité de parole par scénes, puis par répliques et enfin par mots." ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Valère, Élise\n", "\n", "Cléante, Élise\n", "\n", "Harpagon, La Flèche\n", "\n", "Élise, Cléante, Harpagon\n", "\n", "Valère, Harpagon, Élise\n", "\n", "Cléante, La Flèche\n", "\n", "Maître Simon, Harpagon, Cléante, La Flèche\n", "\n", "Frosine, Harpagon\n", "\n", "La Flèche, Frosine\n", "\n", "Harpagon, Frosine\n", "\n", "Harpagon, Cléante, Élise, Valère, Dame Claude, Maître Jacques, Brindavoine, La Merluche\n", "\n", "Maître Jacques, Valère\n", "\n", "Frosine, Mariane, Maître Jacques\n", "\n", "Mariane, Frosine\n", "\n", "Harpagon, Frosine, Mariane\n", "\n", "Élise, Harpagon, Mariane, Frosine\n", "\n", "Cléante, Harpagon, Élise, Mariane, Frosine\n", "\n", "Harpagon, Mariane, Frosine, Cléante, Brindavoine, Élise\n", "\n", "Harpagon, Mariane, Cléante, Élise, Frosine, La Merluche\n", "\n", "Cléante, Mariane, Élise, Frosine\n", "\n", "Harpagon, Cléante, Mariane, Élise, Frosine\n", "\n", "Harpagon, Cléante\n", "\n", "Maître Jacques, Harpagon, Cléante\n", "\n", "Cléante, Harpagon\n", "\n", "La Flèche, Cléante\n", "\n", "\n", "\n", "Harpagon, Le Commissaire, son Clerc\n", "\n", "Maître Jacques, Harpagon, Le Commissaire, son Clerc\n", "\n", "Valère, Harpagon, le Commissaire, son Clerc, Maître Jacques\n", "\n", "Élise, Mariane, Frosine, Harpagon, Valère, Maître Jacques, le Commissaire, son Clerc\n", "\n", "Anselme, Harpagon, Élise, Mariane, Frosine, Valère, Maître Jacques, le Commissaire, son Clerc\n", "\n", "Cléante, Valère, Mariane, Élise, Frosine, Harpagon, Anselme, Maître Jacques, La Flèche, le Commissaire, son Clerc\n", "\n", "8\n", "14\n", "15\n", "23\n", "6\n", "1\n", "15\n", "9\n", "2\n", "2\n", "12\n", "2\n", "2\n", "1\n", "6\n" ] } ], "source": [ "scene=\"###\"\n", "nbscenevalere=0\n", "nbsceneelise=0\n", "nbscenecleante=0\n", "nbsceneharpagon=0\n", "nbscenefleche=0\n", "nbscenesimon=0\n", "nbscenefrosine=0\n", "nbscenejacques=0\n", "nbscenemerluche=0\n", "nbscenebrindavoine=0\n", "nbscenemariane=0\n", "nbscenecommissaire=0\n", "nbsceneanselme=0\n", "nbscenedame=0\n", "nbsceneclerc=0\n", "nbligne=0\n", "\n", "with open(texte_file,'r') as file: \n", " for ligne in file:\n", " nbligne+=1\n", " if scene in ligne:\n", " nompersonnages = linecache.getline(texte_file, nbligne+1)\n", " print(nompersonnages)\n", " \n", " if \"Valère\" in nompersonnages:\n", " nbscenevalere+=1\n", " if \"Élise\" in nompersonnages:\n", " nbsceneelise+=1\n", " if \"Cléante\" in nompersonnages:\n", " nbscenecleante+=1\n", " if \"Harpagon\" in nompersonnages:\n", " nbsceneharpagon+=1\n", " if \"La Flèche\" in nompersonnages:\n", " nbscenefleche+=1\n", " if \"Maître Simon\" in nompersonnages:\n", " nbscenesimon+=1\n", " if \"Frosine\" in nompersonnages:\n", " nbscenefrosine+=1\n", " if \"Maître Jacques\" in nompersonnages:\n", " nbscenejacques+=1\n", " if \"La Merluche\" in nompersonnages:\n", " nbscenemerluche+=1\n", " if \"Brindavoine\" in nompersonnages:\n", " nbscenebrindavoine+=1\n", " if \"Mariane\" in nompersonnages:\n", " nbscenemariane+=1\n", " if \"Le Commissaire\" in nompersonnages:\n", " nbscenecommissaire+=1\n", " if \"Anselme\" in nompersonnages:\n", " nbsceneanselme+=1\n", " if \"Dame Claude\" in nompersonnages:\n", " nbscenedame+=1\n", " if \"son Clerc\" in nompersonnages:\n", " nbsceneclerc+=1\n", " \n", " print(nbscenevalere)\n", " print(nbsceneelise)\n", " print(nbscenecleante)\n", " print(nbsceneharpagon+1)\n", " print(nbscenefleche)\n", " print(nbscenesimon)\n", " print(nbscenefrosine)\n", " print(nbscenejacques)\n", " print(nbscenemerluche)\n", " print(nbscenebrindavoine)\n", " print(nbscenemariane)\n", " print(nbscenecommissaire)\n", " print(nbsceneanselme)\n", " print(nbscenedame)\n", " print(nbsceneclerc)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "101\n", "51\n", "161\n", "354\n", "66\n", "5\n", "60\n", "85\n", "5\n", "3\n", "31\n", "17\n", "20\n", "0\n", "0\n" ] } ], "source": [ "nbacte=0\n", "nbscene=0\n", "nbrepliquevalere=0\n", "nbrepliqueelise=0\n", "nbrepliquecleante=0\n", "nbrepliqueharpagon=0\n", "nbrepliquefleche=0\n", "nbrepliquesimon=0\n", "nbrepliquefrosine=0\n", "nbrepliquejacques=0\n", "nbrepliquemerluche=0\n", "nbrepliquebrindavoine=0\n", "nbrepliquemariane=0\n", "nbrepliquecommissaire=0\n", "nbrepliqueanselme=0\n", "nbrepliquedame=0\n", "nbrepliqueclerc=0\n", "with open(texte_file,'r') as file: \n", " for ligne in file: \n", " if nbacte<2:\n", " if ligne[0]==\"#\" and ligne[1]==\"#\" and ligne[2]!=\"#\":\n", " nbacte+=1\n", " if nbscene<2:\n", " if scene in ligne:\n", " nbscene+=1\n", " if \"VALÈRE\" in ligne:\n", " nbrepliquevalere+=1\n", " if \"ÉLISE\" in ligne:\n", " nbrepliqueelise+=1\n", " if \"CLÉANTE\" in ligne:\n", " nbrepliquecleante+=1\n", " if \"HARPAGON\" in ligne:\n", " nbrepliqueharpagon+=1\n", " if \"LA FLÈCHE\" in ligne:\n", " nbrepliquefleche+=1\n", " if \"MAÎTRE SIMON\" in ligne:\n", " nbrepliquesimon+=1\n", " if \"FROSINE\" in ligne:\n", " nbrepliquefrosine+=1\n", " if \"MAÎTRE JACQUES\" in ligne:\n", " nbrepliquejacques+=1\n", " if \"LA MERLUCHE\" in ligne:\n", " nbrepliquemerluche+=1\n", " if \"BRINDAVOINE\" in ligne:\n", " nbrepliquebrindavoine+=1\n", " if \"MARIANE\" in ligne:\n", " nbrepliquemariane+=1\n", " if \"LE COMMISSAIRE\" in ligne:\n", " nbrepliquecommissaire+=1\n", " if \"ANSELME\" in ligne:\n", " nbrepliqueanselme+=1\n", " if \"DAME CLAUDE\" in ligne:\n", " nbrepliquedame+=1\n", " if \"SON CLERC\" in ligne:\n", " nbrepliqueclerc+=1\n", " nbscene=1\n", " nbacte=1\n", " print(nbrepliquevalere)\n", " print(nbrepliqueelise)\n", " print(nbrepliquecleante)\n", " print(nbrepliqueharpagon)\n", " print(nbrepliquefleche)\n", " print(nbrepliquesimon)\n", " print(nbrepliquefrosine)\n", " print(nbrepliquejacques)\n", " print(nbrepliquemerluche)\n", " print(nbrepliquebrindavoine)\n", " print(nbrepliquemariane)\n", " print(nbrepliquecommissaire)\n", " print(nbrepliqueanselme)\n", " print(nbrepliquedame)\n", " print(nbrepliqueclerc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2559\n", "1023\n", "3177\n", "5218\n", "1436\n", "186\n", "2036\n", "1414\n", "49\n", "38\n", "878\n", "281\n", "488\n", "0\n", "0\n" ] } ], "source": [ "nbligne=0\n", "nbmotsvalere=0\n", "nbmotselise=0\n", "nbmotscleante=0\n", "nbmotsharpagon=0\n", "nbmotsfleche=0\n", "nbmotssimon=0\n", "nbmotsfrosine=0\n", "nbmotsjacques=0\n", "nbmotsmerluche=0\n", "nbmotsbrindavoine=0\n", "nbmotsmariane=0\n", "nbmotscommissaire=0\n", "nbmotsanselme=0\n", "nbmotsclaude=0\n", "nbmotsclerc=0\n", "\n", "with open(texte_file,'r') as file:\n", " for ligne in file:\n", " nbligne+=1\n", " if \"VALÈRE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsvalere+=1\n", " if \"ÉLISE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotselise+=1\n", " if \"CLÉANTE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotscleante+=1\n", " if \"HARPAGON\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsharpagon+=1\n", " if \"LA FLÈCHE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsfleche+=1\n", " if \"MAÎTRE SIMON\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotssimon+=1\n", " if \"FROSINE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsfrosine+=1\n", " if \"MAÎTRE JACQUES\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsjacques+=1\n", " if \"LA MERLUCHE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsmerluche+=1\n", " if \"BRINDAVOINE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsbrindavoine+=1\n", " if \"MARIANE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsmariane+=1\n", " if \"LE COMMISSAIRE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotscommissaire+=1\n", " if \"ANSELME\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsanselme+=1\n", " if \"DAME CLAUDE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsclaude+=1\n", " if \"SON CLERC\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsclerc+=1\n", " \n", " print(nbmotsvalere)\n", " print(nbmotselise)\n", " print(nbmotscleante)\n", " print(nbmotsharpagon)\n", " print(nbmotsfleche)\n", " print(nbmotssimon)\n", " print(nbmotsfrosine)\n", " print(nbmotsjacques)\n", " print(nbmotsmerluche)\n", " print(nbmotsbrindavoine)\n", " print(nbmotsmariane)\n", " print(nbmotscommissaire)\n", " print(nbmotsanselme)\n", " print(nbmotsclaude)\n", " print(nbmotsclerc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [], "source": [ "nbligne=0\n", "nbscene=0\n", "nbacte=0\n", "nbmotsvalere=0\n", "nbmotselise=0\n", "nbmotscleante=0\n", "nbmotsharpagon=0\n", "nbmotsfleche=0\n", "nbmotssimon=0\n", "nbmotsfrosine=0\n", "nbmotsjacques=0\n", "nbmotsmerluche=0\n", "nbmotsbrindavoine=0\n", "nbmotsmariane=0\n", "nbmotscommissaire=0\n", "nbmotsanselme=0\n", "nbmotsclaude=0\n", "nbmotsclerc=0\n", "\n", "listacte=[]\n", "\n", "with open(texte_file,'r') as file: \n", " for ligne in file:\n", " nbligne+=1\n", " if ligne[0]==\"#\" and ligne[1]==\"#\" and ligne[2]!=\"#\":\n", " nbacte+=1\n", " if scene in ligne:\n", " nbscene+=1\n", " listescene=[]\n", " if \"VALÈRE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsvalere+=1\n", " listescene[0]=nbmotsvalere\n", " listeacte[nbacte-1]=listescene\n", " if \"ÉLISE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotselise+=1\n", " if \"CLÉANTE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotscleante+=1\n", " if \"HARPAGON\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsharpagon+=1\n", " if \"LA FLÈCHE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsfleche+=1\n", " if \"MAÎTRE SIMON\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotssimon+=1\n", " if \"FROSINE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsfrosine+=1\n", " if \"MAÎTRE JACQUES\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsjacques+=1\n", " if \"LA MERLUCHE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsmerluche+=1\n", " if \"BRINDAVOINE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsbrindavoine+=1\n", " if \"MARIANE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsmariane+=1\n", " if \"LE COMMISSAIRE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotscommissaire+=1\n", " if \"ANSELME\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsanselme+=1\n", " if \"DAME CLAUDE\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsclaude+=1\n", " if \"SON CLERC\" in ligne:\n", " replique = linecache.getline(texte_file, nbligne+1)\n", " if replique!=\"\":\n", " for mot in replique.split():\n", " nbmotsclerc+=1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "nbmots = [1, 2, 2, 3, 4, 4, 4, 4, 4, 5, 5]\n", "x2 = [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5]\n", "bins = [x + 0.5 for x in range(0, 6)]\n", "plt.hist([x1, x2], bins = bins, color = ['yellow', 'green'],\n", " edgecolor = 'red', hatch = '/', label = ['x1', 'x2'],\n", " histtype = 'bar') # bar est le defaut\n", "plt.ylabel('valeurs')\n", "plt.xlabel('nombres')\n", "plt.title('2 series')\n", "plt.legend()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }