{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FrTmjbt6J-Mb" }, "source": [ "# Analysis of the dialogs in \"L'avare\" of Molière\n", "Student : Hugo Alexander Gonzalez Reyes" ] }, { "cell_type": "markdown", "metadata": { "id": "Hstvx_-lLKzB" }, "source": [ "## 1. \n", "Classify the characters according to their amount of speech using a syntactic analysis of the text (scenes / lines / words). In particular, which character speaks the most? Which one does not speak at all? Note that the names of the characters are not necessarily uniform, case and the presence of accents may vary, for example." ] }, { "cell_type": "markdown", "metadata": { "id": "SUXKCVGJM31_" }, "source": [ "Before doing any work let's install and import all the libraries that we will use trought the document." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3YBzlxmQNGC_", "outputId": "e1c962a4-ba6f-4467-eea8-cb03877160ec" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting numpy\n", " Downloading numpy-1.19.4-cp36-cp36m-manylinux2010_x86_64.whl (14.5 MB)\n", "\u001b[K |████████████████████████████████| 14.5 MB 11.8 MB/s eta 0:00:01 |███ | 1.3 MB 11.8 MB/s eta 0:00:02\n", "\u001b[?25hInstalling collected packages: numpy\n", " Attempting uninstall: numpy\n", " Found existing installation: numpy 1.15.2\n", " Uninstalling numpy-1.15.2:\n", " Successfully uninstalled numpy-1.15.2\n", "Successfully installed numpy-1.19.4\n", "Requirement already satisfied: urllib3 in /opt/conda/lib/python3.6/site-packages (1.25.7)\n", "Collecting pandas\n", " Downloading pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)\n", "\u001b[K |████████████████████████████████| 9.5 MB 9.1 MB/s eta 0:00:01 |▏ | 40 kB 9.3 MB/s eta 0:00:02 |██████▎ | 1.9 MB 9.1 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied, skipping upgrade: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas) (2019.3)\n", "Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /opt/conda/lib/python3.6/site-packages (from pandas) (2.8.1)\n", "Requirement already satisfied, skipping upgrade: numpy>=1.15.4 in /opt/conda/lib/python3.6/site-packages (from pandas) (1.19.4)\n", "Requirement already satisfied, skipping upgrade: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas) (1.14.0)\n", "Installing collected packages: pandas\n", " Attempting uninstall: pandas\n", " Found existing installation: pandas 0.22.0\n", " Uninstalling pandas-0.22.0:\n", " Successfully uninstalled pandas-0.22.0\n", "Successfully installed pandas-1.1.5\n", "Collecting nltk\n", " Downloading nltk-3.5.zip (1.4 MB)\n", "\u001b[K |████████████████████████████████| 1.4 MB 8.9 MB/s eta 0:00:01\n", "\u001b[?25hCollecting click\n", " Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)\n", "\u001b[K |████████████████████████████████| 82 kB 1.5 MB/s eta 0:00:01\n", "\u001b[?25hCollecting joblib\n", " Downloading joblib-1.0.0-py3-none-any.whl (302 kB)\n", "\u001b[K |████████████████████████████████| 302 kB 38.6 MB/s eta 0:00:01\n", "\u001b[?25hCollecting regex\n", " Downloading regex-2020.11.13-cp36-cp36m-manylinux2014_x86_64.whl (723 kB)\n", "\u001b[K |████████████████████████████████| 723 kB 52.7 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: tqdm in /opt/conda/lib/python3.6/site-packages (from nltk) (4.42.0)\n", "Building wheels for collected packages: nltk\n", " Building wheel for nltk (setup.py) ... \u001b[?25ldone\n", "\u001b[?25h Created wheel for nltk: filename=nltk-3.5-py3-none-any.whl size=1434677 sha256=4f3ac817fb65d53c5d646f3941665e07f8a172791b9fd2b682ea46a2469e2d53\n", " Stored in directory: /home/jovyan/.cache/pip/wheels/de/5e/42/64abaeca668161c3e2cecc24f864a8fc421e3d07a104fc8a51\n", "Successfully built nltk\n", "Installing collected packages: click, joblib, regex, nltk\n", "Successfully installed click-7.1.2 joblib-1.0.0 nltk-3.5 regex-2020.11.13\n", "Requirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (1.1.5)\n", "Collecting plotnine\n", " Downloading plotnine-0.7.1-py3-none-any.whl (4.4 MB)\n", "\u001b[K |████████████████████████████████| 4.4 MB 12.7 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: numpy>=1.15.4 in /opt/conda/lib/python3.6/site-packages (from pandas) (1.19.4)\n", "Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.6/site-packages (from pandas) (2.8.1)\n", "Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas) (2019.3)\n", "Collecting matplotlib>=3.1.1\n", " Downloading matplotlib-3.3.3-cp36-cp36m-manylinux1_x86_64.whl (11.6 MB)\n", "\u001b[K |████████████████████████████████| 11.6 MB 23.6 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: patsy>=0.5.1 in /opt/conda/lib/python3.6/site-packages (from plotnine) (0.5.1)\n", "Collecting mizani>=0.7.1\n", " Downloading mizani-0.7.2-py3-none-any.whl (62 kB)\n", "\u001b[K |████████████████████████████████| 62 kB 1.6 MB/s eta 0:00:01\n", "\u001b[?25hCollecting descartes>=1.1.0\n", " Downloading descartes-1.1.0-py3-none-any.whl (5.8 kB)\n", "Collecting statsmodels>=0.11.1\n", " Downloading statsmodels-0.12.1-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)\n", "\u001b[K |████████████████████████████████| 9.5 MB 17.7 MB/s eta 0:00:01\n", "\u001b[?25hCollecting scipy>=1.2.0\n", " Downloading scipy-1.5.4-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)\n", "\u001b[K |████████████████████████████████| 25.9 MB 92 kB/s s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas) (1.14.0)\n", "Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=3.1.1->plotnine) (0.10.0)\n", "Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=3.1.1->plotnine) (7.0.0)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=3.1.1->plotnine) (2.4.6)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=3.1.1->plotnine) (1.1.0)\n", "Collecting palettable\n", " Downloading palettable-3.3.0-py2.py3-none-any.whl (111 kB)\n", "\u001b[K |████████████████████████████████| 111 kB 44.7 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib>=3.1.1->plotnine) (45.2.0.post20200209)\n", "Installing collected packages: matplotlib, palettable, mizani, descartes, scipy, statsmodels, plotnine\n", " Attempting uninstall: matplotlib\n", " Found existing installation: matplotlib 2.2.3\n", " Uninstalling matplotlib-2.2.3:\n", " Successfully uninstalled matplotlib-2.2.3\n", " Attempting uninstall: scipy\n", " Found existing installation: scipy 1.1.0\n", " Uninstalling scipy-1.1.0:\n", " Successfully uninstalled scipy-1.1.0\n", " Attempting uninstall: statsmodels\n", " Found existing installation: statsmodels 0.9.0\n", " Uninstalling statsmodels-0.9.0:\n", " Successfully uninstalled statsmodels-0.9.0\n", "Successfully installed descartes-1.1.0 matplotlib-3.3.3 mizani-0.7.2 palettable-3.3.0 plotnine-0.7.1 scipy-1.5.4 statsmodels-0.12.1\n", "Requirement already satisfied: networkx in /opt/conda/lib/python3.6/site-packages (2.4)\n", "Requirement already satisfied: decorator>=4.3.0 in /opt/conda/lib/python3.6/site-packages (from networkx) (4.4.1)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...\n", "[nltk_data] Unzipping tokenizers/punkt.zip.\n" ] } ], "source": [ "#It is better to be sure that we have all the required pakages\n", "!pip install numpy --upgrade\n", "!pip install urllib3\n", "!pip install pandas --upgrade\n", "!pip install nltk\n", "!pip install pandas plotnine\n", "!pip install networkx\n", "\n", "import urllib.request #library to make web requests and retrieve files\n", "import unicodedata #built-in tool to work with different text formats (e.g. UTF-8)\n", "import pandas as pd #library that allows to work with dataframes\n", "import numpy as np #library that provides tools to work with matrices\n", "import string #built-in tool to work with strings\n", "import nltk #tokenizer library\n", "nltk.download('punkt') #download the tokenizer models\n", "from plotnine import * #library to plot the text analisys\n", "import networkx as nx #graphs library\n", "import matplotlib.pyplot as plt # library to plot the graphs built with networkx" ] }, { "cell_type": "markdown", "metadata": { "id": "ntDYq3TKS5oH" }, "source": [ "With the enviroment set and done we proceed to download the book. We convert the document to a readable format and we print a fragment of it." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vTesI8XBDMY-", "outputId": "d3ead83a-1f3f-4622-ab9f-05f2bf7825ba" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---\n", "identifier: moliere_avare \n", "creator: Molière. \n", "date: 1668 \n", "title: L'Avare. Comédie \n", "---\n", "L'AVARE,\n", "COMÉDIE.\n", "Par J.B.P. MOLIÈRE.\n", "À PARIS, Chez JEAN RIBOU, au Palais, vis à vis la Porte de l'Église de la Sainte Chapelle, à l'Image Saint-Louis. M. DC. LXIX. *AVEC PRIVILÈGE DU ROI*\n", "# ACTEURS.\n", " – Harpagon, Père de Cléante et d'Élise, et Amoureux de Mariane.\n", " – Cléante, Fils d'Harpagon, Amant de Mariane.\n", " – Élise, Fille d'Harpagon, Amante de Valère.\n", " – Valère, Fils d'Anselme, et Amant d'Élise.\n", " – Mariane, Amante de Cléante, et aimée d'Harpagon.\n", " – Anselme, Père de Valère et de Mariane.\n", " – Frosine, Femme d'Intrigue.\n", " – Maitre Simon, Courtier.\n", " – Maitre Jacques, Cuisinier et Cocher d'Harpagon.\n", " – La Flèche, Valet de Cléante.\n", " – Dame Claude, Servante d'Harpagon.\n", " – Brindavoine, laquais d'Harpagon.\n", " – La Merluche, laquais d'Harpagon.\n", " – Le commissaire, et son clerc.\n", "La Scène est à Paris.\n", "# L'Avare, *Comédie.*.\n", "## Acte Premier.\n", "### Scène Première.\n", "Valère, Élise\n", " VALÈRE.\n", "Hé quoi, charmante Élise, vous devenez mélancolique, après les obligeantes assurances que vous avez eu la bonté de me donner de votre foi ?Je vous vois soupirer, hélas, au milieu de ma joie !Est-ce du regret, dites-moi, de m'avoir fait heureux ? et vous repentez-vous de cet engagement où mes feux ont pu vous contraindre ?\n", " ÉLISE.\n", "Non, Valère, je ne puis pas me repentir de tout ce que je fais pour vous. Je m'y sens entraîner par une trop douce puissance, et je n'ai pas même la force de souhaiter que les choses ne fussent pas. Mais, à vous dire vrai, le succès me donne de l'inquiétude ; et je crains fort de vous aimer un peu plus que je ne devrais.\n", " VALÈRE.\n", "Hé que pouvez-vous craindre, Élise, dans les bontés que vous avez pour moi ?\n", " ÉLISE.\n", "Hélas ! cent choses à la fois : L'emportement d'un Père ; les reproches d'une Famille ; les censures du monde ; mais plus que tout, Valère, le changement de votre cœur ; et cette froideur criminelle dont ceux de votre Sexe payent le plus souvent les témoignages trop ardents d'une innocente amour.\n", " VALÈRE.\n", "Ah ! ne me faites pas ce tort, de juger de moi par les autres.Soupçonnez-moi de tout, Élise, plutôt que de manquer à ce que je vous dois.Je vous aime trop pour cela ; et mon amour pour vous, durera autant que ma vie.\n", "\n" ] } ], "source": [ "url = \"http://dramacode.github.io/markdown/moliere_avare.txt\"\n", "\n", "file = urllib.request.urlopen(url)\n", "readable_doc = []\n", "for line in file:\n", " decoded_line = line.decode(\"utf-8\") #we decode the line\n", " decoded_line = unicodedata.normalize(\"NFKD\", decoded_line) #we normalize some weird characters that might appear when working with strings\n", " if decoded_line != '\\n':\n", " readable_doc.append(decoded_line) \n", "print(''.join(readable_doc[0:40]))" ] }, { "cell_type": "markdown", "metadata": { "id": "c0KxOFhwWEDb" }, "source": [ "We observe that the book follows a structure.\n", "* Each act is announced with a \"##\".\n", "* Each scene is announced with a \"###\".\n", "* The line after the scene presents all the actors playing in it.\n", "* For each dialog, the name of the actor is shown starting with four blank spaces and followed by their corresponding lines.\n", "\n", "Following this guidelines we can make a first list structure of the whole text." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_BX4QcBxTGhn", "outputId": "fbb1a7eb-19b6-4043-9366-846bd4b9ab9c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Premier', 'Première', 'VALÈRE, ÉLISE', 'VALÈRE', \"Hé quoi, charmante Élise, vous devenez mélancolique, après les obligeantes assurances que vous avez eu la bonté de me donner de votre foi ?Je vous vois soupirer, hélas, au milieu de ma joie !Est-ce du regret, dites-moi, de m'avoir fait heureux ? et vous repentez-vous de cet engagement où mes feux ont pu vous contraindre ?\"]\n", "['Premier', 'Première', 'VALÈRE, ÉLISE', 'ÉLISE', \"Non, Valère, je ne puis pas me repentir de tout ce que je fais pour vous. Je m'y sens entraîner par une trop douce puissance, et je n'ai pas même la force de souhaiter que les choses ne fussent pas. Mais, à vous dire vrai, le succès me donne de l'inquiétude ; et je crains fort de vous aimer un peu plus que je ne devrais.\"]\n" ] } ], "source": [ "acte = \"\"\n", "scene = \"\"\n", "acteur = \"\"\n", "set_acteurs = \"\"\n", "dialogues = []\n", "\n", "start = False\n", "save_aut_set = False\n", "\n", "for decoded_line in readable_doc:\n", " if save_aut_set:\n", " set_acteurs = decoded_line.replace(\"\\n\",\"\").replace(\".\",\"\").strip().upper()\n", " save_aut_set = False\n", " if decoded_line[:3] == \"###\":\n", " start = False \n", " scene = decoded_line[4:-2].replace(\"Scène \",\"\").strip()\n", " save_aut_set = True\n", " elif decoded_line[:2] == \"##\":\n", " acte = decoded_line[3:-2].replace(\"Acte \",\"\").strip()\n", " elif decoded_line[:4] == \" \":\n", " start = True\n", " acteur = decoded_line.strip()[:-1]\n", " elif start:\n", " dialogues.append([acte, scene, set_acteurs, acteur,decoded_line.replace(\"\\n\",\"\")])\n", "print(dialogues[0])\n", "print(dialogues[1])" ] }, { "cell_type": "markdown", "metadata": { "id": "73p48k8-YMpB" }, "source": [ "We can proceed to build the dataframe to process this structure" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "Jtafl5qBEHhJ" }, "outputs": [], "source": [ "df = pd.DataFrame(dialogues, columns=[\"acte\", \"scene\",\"set_acteurs\" ,\"acteur\",\"text\"])" ] }, { "cell_type": "markdown", "metadata": { "id": "F26xcuE0bsUx" }, "source": [ "### Cleaning data" ] }, { "cell_type": "markdown", "metadata": { "id": "G5riMa_uYUch" }, "source": [ "First thing we can do is to normalize the acts and scenes in the document. We use roman numbers for all of them." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yqrMLDSMFOPQ", "outputId": "042f0653-cda7-4cca-acfc-a3120de4cbac" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Premier', 'II', 'III', 'IV', 'V']\n", "['I', 'II', 'III', 'IV', 'V']\n" ] } ], "source": [ "print(df[\"acte\"].unique().tolist())\n", "df.loc[df.acte == \"Premier\", \"acte\"] = \"I\"\n", "print(df[\"acte\"].unique().tolist())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xiNJ5UfXF5V3", "outputId": "9ddfaf55-216c-4606-afc9-61f4891415c7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Première', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII', 'IX']\n", "['I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII', 'IX']\n" ] } ], "source": [ "print(df[\"scene\"].unique().tolist())\n", "df.loc[df.scene == \"Première\", \"scene\"] = \"I\"\n", "print(df[\"scene\"].unique().tolist())" ] }, { "cell_type": "markdown", "metadata": { "id": "aBaQWxtvYfCh" }, "source": [ "The roman numbers are good to display the values, but to make the processing of the document easier we add columns with integers for each act and scene" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 423 }, "id": "xh-tL3p2CIVh", "outputId": "5c2073a8-aec3-48ec-baae-54e85c0a3baa" }, "outputs": [ { "data": { "text/html": [ "
\n", " | acte | \n", "scene | \n", "set_acteurs | \n", "acteur | \n", "text | \n", "acte_n | \n", "scene_n | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé quoi, charmante Élise, vous devenez méla... | \n", "1 | \n", "1 | \n", "
1 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Non, Valère, je ne puis pas me repentir de to... | \n", "1 | \n", "1 | \n", "
2 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé que pouvez-vous craindre, Élise, dans les... | \n", "1 | \n", "1 | \n", "
3 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Hélas ! cent choses à la fois : L'emportemen... | \n", "1 | \n", "1 | \n", "
4 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Ah ! ne me faites pas ce tort, de juger de moi... | \n", "1 | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1002 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "MAÎTRE JACQUES | \n", "Hélas ! comment faut-il donc faire ? On me do... | \n", "5 | \n", "6 | \n", "
1003 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Seigneur Harpagon, il faut lui pardonner cette... | \n", "5 | \n", "6 | \n", "
1004 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Vous payerez donc le Commissaire ? | \n", "5 | \n", "6 | \n", "
1005 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Soit. Allons vite faire part de notre joie à ... | \n", "5 | \n", "6 | \n", "
1006 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Et moi, voir ma chère Cassette.< Fin > | \n", "5 | \n", "6 | \n", "
1007 rows × 7 columns
\n", "\n", " | acte | \n", "scene | \n", "set_acteurs | \n", "acteur | \n", "text | \n", "acte_n | \n", "scene_n | \n", "free_text | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé quoi, charmante Élise, vous devenez méla... | \n", "1 | \n", "1 | \n", "hé quoi charmante élise vous devenez méla... | \n", "
1 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Non, Valère, je ne puis pas me repentir de to... | \n", "1 | \n", "1 | \n", "non valère je ne puis pas me repentir de to... | \n", "
2 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé que pouvez-vous craindre, Élise, dans les... | \n", "1 | \n", "1 | \n", "hé que pouvez vous craindre élise dans les... | \n", "
3 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Hélas ! cent choses à la fois : L'emportemen... | \n", "1 | \n", "1 | \n", "hélas cent choses à la fois l emportemen... | \n", "
4 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Ah ! ne me faites pas ce tort, de juger de moi... | \n", "1 | \n", "1 | \n", "ah ne me faites pas ce tort de juger de moi... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1002 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "MAÎTRE JACQUES | \n", "Hélas ! comment faut-il donc faire ? On me do... | \n", "5 | \n", "6 | \n", "hélas comment faut il donc faire on me do... | \n", "
1003 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Seigneur Harpagon, il faut lui pardonner cette... | \n", "5 | \n", "6 | \n", "seigneur harpagon il faut lui pardonner cette... | \n", "
1004 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Vous payerez donc le Commissaire ? | \n", "5 | \n", "6 | \n", "vous payerez donc le commissaire | \n", "
1005 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Soit. Allons vite faire part de notre joie à ... | \n", "5 | \n", "6 | \n", "soit allons vite faire part de notre joie à ... | \n", "
1006 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Et moi, voir ma chère Cassette. | \n", "5 | \n", "6 | \n", "et moi voir ma chère cassette | \n", "
1004 rows × 8 columns
\n", "\n", " | acte | \n", "scene | \n", "set_acteurs | \n", "acteur | \n", "text | \n", "acte_n | \n", "scene_n | \n", "free_text | \n", "tokenized_text | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé quoi, charmante Élise, vous devenez méla... | \n", "1 | \n", "1 | \n", "hé quoi charmante élise vous devenez méla... | \n", "[hé, quoi, charmante, élise, vous, devenez, ... | \n", "
1 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Non, Valère, je ne puis pas me repentir de to... | \n", "1 | \n", "1 | \n", "non valère je ne puis pas me repentir de to... | \n", "[non, valère, je, ne, puis, pas, me, repentir... | \n", "
2 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé que pouvez-vous craindre, Élise, dans les... | \n", "1 | \n", "1 | \n", "hé que pouvez vous craindre élise dans les... | \n", "[hé, que, pouvez, vous, craindre, élise, dan... | \n", "
3 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Hélas ! cent choses à la fois : L'emportemen... | \n", "1 | \n", "1 | \n", "hélas cent choses à la fois l emportemen... | \n", "[hélas, cent, choses, à, la, fois, l, emport... | \n", "
4 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Ah ! ne me faites pas ce tort, de juger de moi... | \n", "1 | \n", "1 | \n", "ah ne me faites pas ce tort de juger de moi... | \n", "[ah, ne, me, faites, pas, ce, tort, de, juger,... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1002 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "MAÎTRE JACQUES | \n", "Hélas ! comment faut-il donc faire ? On me do... | \n", "5 | \n", "6 | \n", "hélas comment faut il donc faire on me do... | \n", "[hélas, comment, faut, il, donc, faire, on, m... | \n", "
1003 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Seigneur Harpagon, il faut lui pardonner cette... | \n", "5 | \n", "6 | \n", "seigneur harpagon il faut lui pardonner cette... | \n", "[seigneur, harpagon, il, faut, lui, pardonner,... | \n", "
1004 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Vous payerez donc le Commissaire ? | \n", "5 | \n", "6 | \n", "vous payerez donc le commissaire | \n", "[vous, payerez, donc, le, commissaire] | \n", "
1005 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Soit. Allons vite faire part de notre joie à ... | \n", "5 | \n", "6 | \n", "soit allons vite faire part de notre joie à ... | \n", "[soit, allons, vite, faire, part, de, notre, j... | \n", "
1006 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Et moi, voir ma chère Cassette. | \n", "5 | \n", "6 | \n", "et moi voir ma chère cassette | \n", "[et, moi, voir, ma, chère, cassette] | \n", "
1004 rows × 9 columns
\n", "\n", " | acte | \n", "scene | \n", "set_acteurs | \n", "acteur | \n", "text | \n", "acte_n | \n", "scene_n | \n", "free_text | \n", "tokenized_text | \n", "number_of_words | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé quoi, charmante Élise, vous devenez méla... | \n", "1 | \n", "1 | \n", "hé quoi charmante élise vous devenez méla... | \n", "[hé, quoi, charmante, élise, vous, devenez, ... | \n", "58 | \n", "
1 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Non, Valère, je ne puis pas me repentir de to... | \n", "1 | \n", "1 | \n", "non valère je ne puis pas me repentir de to... | \n", "[non, valère, je, ne, puis, pas, me, repentir... | \n", "68 | \n", "
2 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Hé que pouvez-vous craindre, Élise, dans les... | \n", "1 | \n", "1 | \n", "hé que pouvez vous craindre élise dans les... | \n", "[hé, que, pouvez, vous, craindre, élise, dan... | \n", "14 | \n", "
3 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "ÉLISE | \n", "Hélas ! cent choses à la fois : L'emportemen... | \n", "1 | \n", "1 | \n", "hélas cent choses à la fois l emportemen... | \n", "[hélas, cent, choses, à, la, fois, l, emport... | \n", "51 | \n", "
4 | \n", "I | \n", "I | \n", "VALÈRE, ÉLISE | \n", "VALÈRE | \n", "Ah ! ne me faites pas ce tort, de juger de moi... | \n", "1 | \n", "1 | \n", "ah ne me faites pas ce tort de juger de moi... | \n", "[ah, ne, me, faites, pas, ce, tort, de, juger,... | \n", "45 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1002 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "MAÎTRE JACQUES | \n", "Hélas ! comment faut-il donc faire ? On me do... | \n", "5 | \n", "6 | \n", "hélas comment faut il donc faire on me do... | \n", "[hélas, comment, faut, il, donc, faire, on, m... | \n", "23 | \n", "
1003 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Seigneur Harpagon, il faut lui pardonner cette... | \n", "5 | \n", "6 | \n", "seigneur harpagon il faut lui pardonner cette... | \n", "[seigneur, harpagon, il, faut, lui, pardonner,... | \n", "8 | \n", "
1004 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Vous payerez donc le Commissaire ? | \n", "5 | \n", "6 | \n", "vous payerez donc le commissaire | \n", "[vous, payerez, donc, le, commissaire] | \n", "5 | \n", "
1005 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "ANSELME | \n", "Soit. Allons vite faire part de notre joie à ... | \n", "5 | \n", "6 | \n", "soit allons vite faire part de notre joie à ... | \n", "[soit, allons, vite, faire, part, de, notre, j... | \n", "11 | \n", "
1006 | \n", "V | \n", "VI | \n", "CLÉANTE, VALÈRE, MARIANE, ÉLISE, FROSINE, H... | \n", "HARPAGON | \n", "Et moi, voir ma chère Cassette. | \n", "5 | \n", "6 | \n", "et moi voir ma chère cassette | \n", "[et, moi, voir, ma, chère, cassette] | \n", "6 | \n", "
1004 rows × 10 columns
\n", "\n", " | acteur | \n", "number_of_words | \n", "
---|---|---|
0 | \n", "HARPAGON | \n", "6256 | \n", "
1 | \n", "CLÉANTE | \n", "3364 | \n", "
2 | \n", "VALÈRE | \n", "2772 | \n", "
3 | \n", "FROSINE | \n", "2363 | \n", "
4 | \n", "MAÎTRE JACQUES | \n", "1717 | \n", "
5 | \n", "LA FLÈCHE | \n", "1522 | \n", "
6 | \n", "ÉLISE | \n", "1071 | \n", "
7 | \n", "MARIANE | \n", "919 | \n", "
8 | \n", "ANSELME | \n", "517 | \n", "
9 | \n", "LE COMMISSAIRE | \n", "294 | \n", "
10 | \n", "MAÎTRE SIMON | \n", "197 | \n", "
11 | \n", "LA MERLUCHE | \n", "55 | \n", "
12 | \n", "BRINDAVOINE | \n", "43 | \n", "
\n", " | acte_n | \n", "scene_n | \n", "acte | \n", "scene | \n", "acteur | \n", "number_of_words | \n", "perc | \n", "
---|---|---|---|---|---|---|---|
9 | \n", "1 | \n", "5 | \n", "I | \n", "V | \n", "ÉLISE | \n", "36 | \n", "3.52 | \n", "
10 | \n", "1 | \n", "5 | \n", "I | \n", "V | \n", "HARPAGON | \n", "278 | \n", "27.15 | \n", "
11 | \n", "1 | \n", "5 | \n", "I | \n", "V | \n", "VALÈRE | \n", "710 | \n", "69.34 | \n", "
6 | \n", "1 | \n", "4 | \n", "I | \n", "IV | \n", "CLÉANTE | \n", "216 | \n", "14.01 | \n", "
7 | \n", "1 | \n", "4 | \n", "I | \n", "IV | \n", "ÉLISE | \n", "166 | \n", "10.77 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
73 | \n", "5 | \n", "2 | \n", "V | \n", "II | \n", "HARPAGON | \n", "182 | \n", "26.42 | \n", "
74 | \n", "5 | \n", "2 | \n", "V | \n", "II | \n", "LE COMMISSAIRE | \n", "159 | \n", "23.08 | \n", "
75 | \n", "5 | \n", "2 | \n", "V | \n", "II | \n", "MAÎTRE JACQUES | \n", "348 | \n", "50.51 | \n", "
71 | \n", "5 | \n", "1 | \n", "V | \n", "I | \n", "HARPAGON | \n", "89 | \n", "44.95 | \n", "
72 | \n", "5 | \n", "1 | \n", "V | \n", "I | \n", "LE COMMISSAIRE | \n", "109 | \n", "55.05 | \n", "
95 rows × 7 columns
\n", "\n", " | acte_n | \n", "scene_n | \n", "acte | \n", "scene | \n", "acteur | \n", "number_of_words | \n", "perc | \n", "w | \n", "
---|---|---|---|---|---|---|---|---|
9 | \n", "1 | \n", "5 | \n", "I | \n", "V | \n", "ÉLISE | \n", "36 | \n", "3.52 | \n", "0.496846 | \n", "
10 | \n", "1 | \n", "5 | \n", "I | \n", "V | \n", "HARPAGON | \n", "278 | \n", "27.15 | \n", "0.496846 | \n", "
11 | \n", "1 | \n", "5 | \n", "I | \n", "V | \n", "VALÈRE | \n", "710 | \n", "69.34 | \n", "0.496846 | \n", "
6 | \n", "1 | \n", "4 | \n", "I | \n", "IV | \n", "CLÉANTE | \n", "216 | \n", "14.01 | \n", "0.748180 | \n", "
7 | \n", "1 | \n", "4 | \n", "I | \n", "IV | \n", "ÉLISE | \n", "166 | \n", "10.77 | \n", "0.748180 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
73 | \n", "5 | \n", "2 | \n", "V | \n", "II | \n", "HARPAGON | \n", "182 | \n", "26.42 | \n", "0.334304 | \n", "
74 | \n", "5 | \n", "2 | \n", "V | \n", "II | \n", "LE COMMISSAIRE | \n", "159 | \n", "23.08 | \n", "0.334304 | \n", "
75 | \n", "5 | \n", "2 | \n", "V | \n", "II | \n", "MAÎTRE JACQUES | \n", "348 | \n", "50.51 | \n", "0.334304 | \n", "
71 | \n", "5 | \n", "1 | \n", "V | \n", "I | \n", "HARPAGON | \n", "89 | \n", "44.95 | \n", "0.096070 | \n", "
72 | \n", "5 | \n", "1 | \n", "V | \n", "I | \n", "LE COMMISSAIRE | \n", "109 | \n", "55.05 | \n", "0.096070 | \n", "
95 rows × 8 columns
\n", "