{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"# Analyse des dialogues dans l'Avare de Molière"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Choix du fichier source\n",
"\n",
"*L'Avare* de Molière est disponible dans plusieurs formats différents. \n",
"Cependant, tous ne se prêtent pas à une analyse sémantique d'une pièce de théâtre. \n",
"Les formats reposant sur du texte brut (tels que Markdown ou iramuteq, et par extension, les fichiers ne contenant que les prises de parole) compliquent le triage des [didascalies](https://fr.wikipedia.org/wiki/Didascalie_(théâtre)) et autres blocs de texte insérés dans les scènes, et ne relevant pas directement du dialogue. \n",
"Notre choix devra donc se porter sur un format plus structuré.\n",
"Nous pourrions tenter l'analyse des fichiers epub ou kindle, mais ce sont des formats destinés à la présentation, ce qui rendrait leur analyse inutilement complexe et coûteuse, alors que de meilleurs formats sont disponibles.\n",
"\n",
"Les formats de fichier constituant de meilleurs candidats pour une analyse sémantique sont basés sur XML, qui permet la structuration du contenu : TEI (conçu par le *Text Encoding Initiative Consortium*), TXM (co-développé par l'École normale supérieure de Lyon et l'université de Franche-Comté), et HTML (le langage de balisage du web). \n",
"\n",
"Ces trois formats sont basés sur XML, et peuvent donc théoriquement être exploités avec une même API ([XPath](https://fr.wikipedia.org/wiki/XPath)), sans nécessiter de bibliothèque tierce. \n",
"\n",
"HTML présente toutefois des avantages considérables : son exploitation par, au minimum, quelques centaines de millions de sites web à travers le monde, et sa gouvernance par un consortium d'entreprises comme Apple, Google ou Mozilla outre-Atlantique, ou encore l'Inria en France. \n",
"C'est le format qui a créé internet, et il est réutilisé dans des contextes très différents.\n",
"De plus, en tant que développeur web depuis 30 ans, l'auteur de la présente analyse ne cache pas son intérêt particulier pour ce format, avec lequel il est bien plus familier qu'avec les autres.\n",
"\n",
"Nous poursuivrons donc cette étude avec le fichier `moliere_avare.html` [mis à disposition](http://dramacode.github.io/html/moliere_avare.html) par [dramacode](https://dramacode.github.io)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ouverture du fichier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Commençons par regrouper les importations, afin d'en avoir une vue d'ensemble."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# Parseur XML, requêtes XPath\n",
"import xml.etree.ElementTree as ET\n",
"\n",
"# Analyse\n",
"import pandas as pd\n",
"\n",
"# Utile pour la gestion des caractères accentués\n",
"import locale\n",
"\n",
"# Nous permet de combiner deux listes de longueur indéterminée\n",
"from itertools import zip_longest\n",
"\n",
"# Utilisation d'expressions régulières (regex)\n",
"import re\n",
"\n",
"# Traçage de graphiques\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.colors as mcolors\n",
"\n",
"# Permet de définir et d'afficher la colormap des personnages\n",
"from itertools import cycle\n",
"import matplotlib.patches as mpatches\n",
"\n",
"# Utiles aux calculs effectués pour les graphiques\n",
"import math\n",
"import numpy as np\n",
"\n",
"# Graphe réseau\n",
"import networkx as nx\n",
"import bokeh.plotting as bkp\n",
"from bokeh.io import output_notebook, show\n",
"from bokeh.resources import INLINE\n",
"from bokeh.models import (\n",
" GraphRenderer,\n",
" StaticLayoutProvider,\n",
" Circle,\n",
" MultiLine,\n",
" HoverTool,\n",
" Arrow,\n",
" NormalHead,\n",
" ColumnDataSource,\n",
" LabelSet,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" Loading BokehJS ...\n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/javascript": [
"\n",
"(function(root) {\n",
" function now() {\n",
" return new Date();\n",
" }\n",
"\n",
" var force = true;\n",
"\n",
" if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n",
" root._bokeh_onload_callbacks = [];\n",
" root._bokeh_is_loading = undefined;\n",
" }\n",
"\n",
" var JS_MIME_TYPE = 'application/javascript';\n",
" var HTML_MIME_TYPE = 'text/html';\n",
" var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n",
" var CLASS_NAME = 'output_bokeh rendered_html';\n",
"\n",
" /**\n",
" * Render data to the DOM node\n",
" */\n",
" function render(props, node) {\n",
" var script = document.createElement(\"script\");\n",
" node.appendChild(script);\n",
" }\n",
"\n",
" /**\n",
" * Handle when an output is cleared or removed\n",
" */\n",
" function handleClearOutput(event, handle) {\n",
" var cell = handle.cell;\n",
"\n",
" var id = cell.output_area._bokeh_element_id;\n",
" var server_id = cell.output_area._bokeh_server_id;\n",
" // Clean up Bokeh references\n",
" if (id !== undefined) {\n",
" Bokeh.index[id].model.document.clear();\n",
" delete Bokeh.index[id];\n",
" }\n",
"\n",
" if (server_id !== undefined) {\n",
" // Clean up Bokeh references\n",
" var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n",
" cell.notebook.kernel.execute(cmd, {\n",
" iopub: {\n",
" output: function(msg) {\n",
" var element_id = msg.content.text.trim();\n",
" Bokeh.index[element_id].model.document.clear();\n",
" delete Bokeh.index[element_id];\n",
" }\n",
" }\n",
" });\n",
" // Destroy server and session\n",
" var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n",
" cell.notebook.kernel.execute(cmd);\n",
" }\n",
" }\n",
"\n",
" /**\n",
" * Handle when a new output is added\n",
" */\n",
" function handleAddOutput(event, handle) {\n",
" var output_area = handle.output_area;\n",
" var output = handle.output;\n",
"\n",
" // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n",
" if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n",
" return\n",
" }\n",
"\n",
" var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n",
"\n",
" if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n",
" toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n",
" // store reference to embed id on output_area\n",
" output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n",
" }\n",
" if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n",
" var bk_div = document.createElement(\"div\");\n",
" bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n",
" var script_attrs = bk_div.children[0].attributes;\n",
" for (var i = 0; i < script_attrs.length; i++) {\n",
" toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n",
" }\n",
" // store reference to server id on output_area\n",
" output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n",
" }\n",
" }\n",
"\n",
" function register_renderer(events, OutputArea) {\n",
"\n",
" function append_mime(data, metadata, element) {\n",
" // create a DOM node to render to\n",
" var toinsert = this.create_output_subarea(\n",
" metadata,\n",
" CLASS_NAME,\n",
" EXEC_MIME_TYPE\n",
" );\n",
" this.keyboard_manager.register_events(toinsert);\n",
" // Render to node\n",
" var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n",
" render(props, toinsert[toinsert.length - 1]);\n",
" element.append(toinsert);\n",
" return toinsert\n",
" }\n",
"\n",
" /* Handle when an output is cleared or removed */\n",
" events.on('clear_output.CodeCell', handleClearOutput);\n",
" events.on('delete.Cell', handleClearOutput);\n",
"\n",
" /* Handle when a new output is added */\n",
" events.on('output_added.OutputArea', handleAddOutput);\n",
"\n",
" /**\n",
" * Register the mime type and append_mime function with output_area\n",
" */\n",
" OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n",
" /* Is output safe? */\n",
" safe: true,\n",
" /* Index of renderer in `output_area.display_order` */\n",
" index: 0\n",
" });\n",
" }\n",
"\n",
" // register the mime type if in Jupyter Notebook environment and previously unregistered\n",
" if (root.Jupyter !== undefined) {\n",
" var events = require('base/js/events');\n",
" var OutputArea = require('notebook/js/outputarea').OutputArea;\n",
"\n",
" if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n",
" register_renderer(events, OutputArea);\n",
" }\n",
" }\n",
"\n",
" \n",
" if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n",
" root._bokeh_timeout = Date.now() + 5000;\n",
" root._bokeh_failed_load = false;\n",
" }\n",
"\n",
" var NB_LOAD_WARNING = {'data': {'text/html':\n",
" \"
\\n\"+\n",
" \"
\\n\"+\n",
" \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n",
" \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n",
" \"
\\n\"+\n",
" \"
\\n\"+\n",
" \"
re-rerun `output_notebook()` to attempt to load from CDN again, or
\"}};\n",
"\n",
" function display_loaded() {\n",
" var el = document.getElementById(\"a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7\");\n",
" if (el != null) {\n",
" el.textContent = \"BokehJS is loading...\";\n",
" }\n",
" if (root.Bokeh !== undefined) {\n",
" if (el != null) {\n",
" el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n",
" }\n",
" } else if (Date.now() < root._bokeh_timeout) {\n",
" setTimeout(display_loaded, 100)\n",
" }\n",
" }\n",
"\n",
"\n",
" function run_callbacks() {\n",
" try {\n",
" root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n",
" }\n",
" finally {\n",
" delete root._bokeh_onload_callbacks\n",
" }\n",
" console.info(\"Bokeh: all callbacks have finished\");\n",
" }\n",
"\n",
" function load_libs(js_urls, callback) {\n",
" root._bokeh_onload_callbacks.push(callback);\n",
" if (root._bokeh_is_loading > 0) {\n",
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n",
" return null;\n",
" }\n",
" if (js_urls == null || js_urls.length === 0) {\n",
" run_callbacks();\n",
" return null;\n",
" }\n",
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n",
" root._bokeh_is_loading = js_urls.length;\n",
" for (var i = 0; i < js_urls.length; i++) {\n",
" var url = js_urls[i];\n",
" var s = document.createElement('script');\n",
" s.src = url;\n",
" s.async = false;\n",
" s.onreadystatechange = s.onload = function() {\n",
" root._bokeh_is_loading--;\n",
" if (root._bokeh_is_loading === 0) {\n",
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n",
" run_callbacks()\n",
" }\n",
" };\n",
" s.onerror = function() {\n",
" console.warn(\"failed to load library \" + url);\n",
" };\n",
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" }\n",
" };var element = document.getElementById(\"a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7\");\n",
" if (element == null) {\n",
" console.log(\"Bokeh: ERROR: autoload.js configured with elementid 'a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7' but no matching script tag was found. \")\n",
" return false;\n",
" }\n",
"\n",
" var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.12.16.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.16.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.16.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.12.16.min.js\"];\n",
"\n",
" var inline_js = [\n",
" function(Bokeh) {\n",
" Bokeh.set_log_level(\"info\");\n",
" },\n",
" \n",
" function(Bokeh) {\n",
" \n",
" },\n",
" function(Bokeh) {\n",
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.12.16.min.css\");\n",
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.12.16.min.css\");\n",
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.16.min.css\");\n",
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.16.min.css\");\n",
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.16.min.css\");\n",
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.16.min.css\");\n",
" }\n",
" ];\n",
"\n",
" function run_inline_js() {\n",
" \n",
" if ((root.Bokeh !== undefined) || (force === true)) {\n",
" for (var i = 0; i < inline_js.length; i++) {\n",
" inline_js[i].call(root, root.Bokeh);\n",
" }if (force === true) {\n",
" display_loaded();\n",
" }} else if (Date.now() < root._bokeh_timeout) {\n",
" setTimeout(run_inline_js, 100);\n",
" } else if (!root._bokeh_failed_load) {\n",
" console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n",
" root._bokeh_failed_load = true;\n",
" } else if (force !== true) {\n",
" var cell = $(document.getElementById(\"a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7\")).parents('.cell').data().cell;\n",
" cell.output_area.append_execute_result(NB_LOAD_WARNING)\n",
" }\n",
"\n",
" }\n",
"\n",
" if (root._bokeh_is_loading === 0) {\n",
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n",
" run_inline_js();\n",
" } else {\n",
" load_libs(js_urls, function() {\n",
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n",
" run_inline_js();\n",
" });\n",
" }\n",
"}(window));"
],
"application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"
\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"
re-rerun `output_notebook()` to attempt to load from CDN again, or
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid 'a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.12.16.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.16.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.16.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.12.16.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.12.16.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.12.16.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.16.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.16.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.16.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.16.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"a8e1aee7-1fff-43ed-b9fe-4ab601b77cd7\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Directive pour bokeh pour inclure le graphe réseau final dans le notebook\n",
"output_notebook()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# La définition de la locale nous permettra de gérer correctement les majuscules\n",
"# accentuées\n",
"# locale.setlocale(locale.LC_COLLATE, \"fr_FR.UTF-8\") # ou \"fr_FR.UTF-8\", \"fr_FR\" selon le système\n",
"\n",
"ns = {\"x\": \"http://www.w3.org/1999/xhtml\"}\n",
"root = ET.parse(\"moliere_avare.html\").getroot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outils préliminaires\n",
"\n",
"Nous travaillons sur une pièce de théâtre, par définition divisée en actes et en scènes.\n",
"Nous allons donc nous créer quelques outils pour accéder facilement à ces éléments structurés, que nous complèterons de diverses fonctions utilisées à plusieurs reprises dans notre étude."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Nettoyage des titres\n",
"def clean_title(el, tag):\n",
" return \"\".join(el.find(tag, ns).itertext()).replace(\"§\", \"\").strip()\n",
"\n",
"# Actes avec ordre explicite\n",
"def list_acts():\n",
" acts = []\n",
"\n",
" for idx, act in enumerate(root.findall(\".//x:section[@class='div1 act level2']\", ns)):\n",
" acts.append({\n",
" \"id\": act.get(\"id\"),\n",
" \"title\": clean_title(act, \"x:h2\"),\n",
" \"node\": act,\n",
" \"order\": idx,\n",
" })\n",
"\n",
" return acts\n",
"\n",
"# Scènes d’un acte donné, avec ordre explicite\n",
"def list_scenes(act=None, act_id=None):\n",
" if act is None:\n",
" if act_id is None:\n",
" raise ValueError(\"Un élément `act` ou un identifiant doit être spécifié\")\n",
"\n",
" act = root.find(f\".//x:section[@class='div1 act level2'][@id='{act_id}']\", ns)\n",
"\n",
" if act is None:\n",
" raise ValueError(f\"Acte introuvable: {act_id}\")\n",
"\n",
" scenes = []\n",
"\n",
" for idx, scene in enumerate(act.findall(\"x:section[@class='div2 scene level3']\", ns)):\n",
" scenes.append({\n",
" \"id\": scene.get(\"id\"),\n",
" \"title\": clean_title(scene, \"x:h3\"),\n",
" \"node\": scene,\n",
" \"order\": idx,\n",
" })\n",
"\n",
" return scenes\n",
"\n",
"# On compacte les espaces pour calquer le comptage sur l'OBVIL\n",
"# Supprime les balises de l'élément HTML soumis.\n",
"# Cela permet de ne conserver que les noms de personnages extraits des blocs de dialogue,\n",
"# sans les didascalies.\n",
"def text_without_i(el):\n",
" parts = []\n",
"\n",
" if el.tag != f\"{{{ns['x']}}}i\" and el.text and el.text.strip():\n",
" parts.append(el.text.strip())\n",
"\n",
" for child in el:\n",
" if child.tag != f\"{{{ns['x']}}}i\":\n",
" parts.extend(text_without_i(child))\n",
"\n",
" # on garde toujours le texte suivant, même si le noeud enfant est une balise \n",
" if child.tail and child.tail.strip():\n",
" parts.append(child.tail.strip())\n",
"\n",
" return parts\n",
"\n",
"# Formate le nom d'un acteur extrait d'un dialogue\n",
"def speaker_name(sp):\n",
" name = \" \".join(text_without_i(sp)).strip()\n",
"\n",
" # nettoyage simple de la ponctuation finale\n",
" name = name.rstrip(\",;:\").strip()\n",
"\n",
" return name\n",
"\n",
"# Résolution d'un nom d'acteur à partir de notre table de correspondance\n",
"alias_index = {}\n",
"def resolve_name(name):\n",
" return alias_index.get(name, name)\n",
"\n",
"# Extrait le texte brut d'un dialogue soumis sous la forme d'un élément HTML\n",
"def speech_text(sp):\n",
" parts = []\n",
"\n",
" for p in sp.findall(\".//x:p[@class='p autofirst']\", ns):\n",
" parts.extend(text_without_i(p))\n",
"\n",
" raw = \" \".join(parts)\n",
" return \" \".join(raw.split()).strip()\n",
"\n",
"# Compte le nombre de mots d'un texte brut.\n",
"# On utilise ici une regex simple dédiée à cet usage.\n",
"def word_count(txt):\n",
" return len(re.findall(r\"\\b\\w+\\b\", txt, flags=re.UNICODE))\n",
"\n",
"# Conversion d'un texte en nombre de lignes (60 caractères par ligne)\n",
"def line_count(txt, line_length=60):\n",
" return len(txt) / line_length if txt else 0\n",
"\n",
"# Retourne l'acteur associé à une réplique (
)\n",
"def speech_actor(sp):\n",
" speaker_el = sp.find(\"x:p[@class='speaker']\", ns)\n",
"\n",
" if speaker_el is None:\n",
" return \"\"\n",
"\n",
" return resolve_name(speaker_name(speaker_el))\n",
"\n",
"# Liste les répliques d'une scène (par noeud ou identifiant) avec texte et nombre de mots\n",
"def scene_speeches(scene=None, scene_id=None):\n",
" if scene is None:\n",
" if scene_id is None:\n",
" raise ValueError(\"Un élément `scene` ou un identifiant doit être spécifié\")\n",
"\n",
" scene = root.find(f\".//x:section[@class='div2 scene level3'][@id='{scene_id}']\", ns)\n",
"\n",
" if scene is None:\n",
" raise ValueError(f\"Scène introuvable: {scene_id}\")\n",
"\n",
" speeches = []\n",
"\n",
" for sp_div in scene.findall(\".//x:div[@class='sp']\", ns):\n",
" speaker = speech_actor(sp_div)\n",
"\n",
" if not speaker:\n",
" continue\n",
"\n",
" txt = speech_text(sp_div)\n",
"\n",
" speeches.append({\n",
" \"speaker\": speaker,\n",
" \"text\": txt,\n",
" \"word_count\": word_count(txt),\n",
" \"node\": sp_div,\n",
" })\n",
"\n",
" return speeches\n",
"\n",
"# Création d'une colormap associant une couleur à un personnage\n",
"def create_actors_colormap(personnages):\n",
" # Définition d'une palette de couleurs pour les personnages\n",
" palette = cycle(plt.cm.tab20.colors)\n",
" color_map = {}\n",
"\n",
" for p in personnages:\n",
" color_map[p] = next(palette)\n",
"\n",
" return color_map"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Vérifions que nous obtenons bien la liste des actes et des scènes :"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Acte I - Scène I01\n",
"Acte I - Scène I02\n",
"Acte I - Scène I03\n",
"Acte I - Scène I04\n",
"Acte I - Scène I05\n",
"Acte II - Scène II01\n",
"Acte II - Scène II02\n",
"Acte II - Scène II03\n",
"Acte II - Scène II04\n",
"Acte II - Scène II05\n",
"Acte III - Scène III01\n",
"Acte III - Scène III02\n",
"Acte III - Scène III03\n",
"Acte III - Scène III04\n",
"Acte III - Scène III05\n",
"Acte III - Scène III06\n",
"Acte III - Scène III07\n",
"Acte III - Scène III08\n",
"Acte III - Scène III09\n",
"Acte IV - Scène IV01\n",
"Acte IV - Scène IV02\n",
"Acte IV - Scène IV03\n",
"Acte IV - Scène IV04\n",
"Acte IV - Scène IV05\n",
"Acte IV - Scène IV06\n",
"Acte IV - Scène IV07\n",
"Acte V - Scène V01\n",
"Acte V - Scène V02\n",
"Acte V - Scène V03\n",
"Acte V - Scène V04\n",
"Acte V - Scène V05\n",
"Acte V - Scène V06\n"
]
}
],
"source": [
"for act in list_acts():\n",
" for scene in list_scenes(act=act[\"node\"]):\n",
" print (\"Acte \" + act[\"id\"] + \" - Scène \" + scene[\"id\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Obtention de la liste des acteurs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La requête xpath suivante permet d'extraire la liste des acteurs donnée au début du fichier, autrement appelée [_dramatis personae_](https://fr.wikipedia.org/wiki/Dramatis_personæ_(théâtre))."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Description
\n",
"
Personnage
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Père de Cléante et d'Élise, et Amoureux de Mar...
\n",
"
Harpagon
\n",
"
\n",
"
\n",
"
1
\n",
"
Fils d'Harpagon, Amant de Mariane.
\n",
"
Cléante
\n",
"
\n",
"
\n",
"
2
\n",
"
Fille d'Harpagon, Amante de Valère.
\n",
"
Élise
\n",
"
\n",
"
\n",
"
3
\n",
"
Fils d'Anselme, et Amant d'Élise.
\n",
"
Valère
\n",
"
\n",
"
\n",
"
4
\n",
"
Amante de Cléante, et aimée d'Harpagon.
\n",
"
Mariane
\n",
"
\n",
"
\n",
"
5
\n",
"
Père de Valère et de Mariane.
\n",
"
Anselme
\n",
"
\n",
"
\n",
"
6
\n",
"
Femme d'Intrigue.
\n",
"
Frosine
\n",
"
\n",
"
\n",
"
7
\n",
"
Courtier.
\n",
"
Maitre Simon
\n",
"
\n",
"
\n",
"
8
\n",
"
Cuisinier et Cocher d'Harpagon.
\n",
"
Maitre Jacques
\n",
"
\n",
"
\n",
"
9
\n",
"
Valet de Cléante.
\n",
"
La Flèche
\n",
"
\n",
"
\n",
"
10
\n",
"
Servante d'Harpagon.
\n",
"
Dame Claude
\n",
"
\n",
"
\n",
"
11
\n",
"
laquais d'Harpagon.
\n",
"
Brindavoine
\n",
"
\n",
"
\n",
"
12
\n",
"
laquais d'Harpagon.
\n",
"
La Merluche
\n",
"
\n",
"
\n",
"
13
\n",
"
et son clerc.
\n",
"
Le commissaire
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Description Personnage\n",
"0 Père de Cléante et d'Élise, et Amoureux de Mar... Harpagon\n",
"1 Fils d'Harpagon, Amant de Mariane. Cléante\n",
"2 Fille d'Harpagon, Amante de Valère. Élise\n",
"3 Fils d'Anselme, et Amant d'Élise. Valère\n",
"4 Amante de Cléante, et aimée d'Harpagon. Mariane\n",
"5 Père de Valère et de Mariane. Anselme\n",
"6 Femme d'Intrigue. Frosine\n",
"7 Courtier. Maitre Simon\n",
"8 Cuisinier et Cocher d'Harpagon. Maitre Jacques\n",
"9 Valet de Cléante. La Flèche\n",
"10 Servante d'Harpagon. Dame Claude\n",
"11 laquais d'Harpagon. Brindavoine\n",
"12 laquais d'Harpagon. La Merluche\n",
"13 et son clerc. Le commissaire"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Retourne une liste formatée des personnages définis dans la dramatis personae.\n",
"# Prend en compte les majuscules accentuées.\n",
"def dramatis_personae():\n",
" rows = []\n",
"\n",
" # Requête xpath permettant d'obtenir la liste des balises
listant les acteurs\n",
" for li in root.findall(\".//x:div[@id='castList']//x:li\", ns):\n",
" # L'acteur se trouve dans une balise \n",
" span = li.find(\"x:span\", ns)\n",
" name = span.text.strip()\n",
"\n",
" # description = texte qui suit le dans la même balise
\n",
" desc = (span.tail or \"\").strip()\n",
"\n",
" if desc.startswith(\",\"):\n",
" desc = desc[1:].strip()\n",
"\n",
" rows.append({\"Personnage\": name, \"Description\": desc})\n",
"\n",
" return pd.DataFrame(rows)\n",
"\n",
"dramatis_personae = dramatis_personae()\n",
"\n",
"# Affichage de la liste\n",
"dramatis_personae"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous pouvons déjà constater que le commissaire et son clerc sont considérés comme un acteur unique.\n",
"Nous verrons plus tard si cette information est importante (par exemple, si le clerc s'exprime en son nom propre)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous allons confronter cette liste avec la liste des protagonistes mentionnés en introduction de chaque scène, puis avec ceux qui interviennent \"réellement\", c'est-à-dire ceux qui ont une ligne de dialogue.\n",
"Cette étape devra nous permettre d'identifier des différences d'orthographe subtiles qu'il sera utile de gérer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Noms des personnages par scène"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous allons itérer sur chaque acte, puis chaque scène, afin de consulter la liste des protagonistes. \n",
"N'oublions pas que ces listes sont facultatives, et ne désignent pas les acteurs dotés d'une réplique.\n",
"Néanmoins, nous pourrions identifier des éléments potentiellement intéressants, tels que des orthographes différentes ou une anomalie quelconque."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Acte
\n",
"
Protagonistes
\n",
"
Scène
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Acte Premier
\n",
"
[Valère, Élise]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
1
\n",
"
Acte Premier
\n",
"
[Cléante, Élise]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
2
\n",
"
Acte Premier
\n",
"
[Harpagon, La Flèche]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
3
\n",
"
Acte Premier
\n",
"
[Élise, Cléante, Harpagon]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
4
\n",
"
Acte Premier
\n",
"
[Valère, Harpagon, Élise]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
5
\n",
"
Acte II
\n",
"
[Cléante, La Flèche]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
6
\n",
"
Acte II
\n",
"
[Maître Simon, Harpagon, Cléante, La Flèche]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
7
\n",
"
Acte II
\n",
"
[Frosine, Harpagon]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
8
\n",
"
Acte II
\n",
"
[La Flèche, Frosine]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
9
\n",
"
Acte II
\n",
"
[Harpagon, Frosine]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
10
\n",
"
Acte III
\n",
"
[Harpagon, Cléante, Élise, Valère, Dame Claude...
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
11
\n",
"
Acte III
\n",
"
[Maître Jacques, Valère]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
12
\n",
"
Acte III
\n",
"
[Frosine, Mariane, Maître Jacques]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
13
\n",
"
Acte III
\n",
"
[Mariane, Frosine]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
14
\n",
"
Acte III
\n",
"
[Harpagon, Frosine, Mariane]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
15
\n",
"
Acte III
\n",
"
[Élise, Harpagon, Mariane, Frosine]
\n",
"
Scène VI
\n",
"
\n",
"
\n",
"
16
\n",
"
Acte III
\n",
"
[Cléante, Harpagon, Élise, Mariane, Frosine]
\n",
"
Scène VII
\n",
"
\n",
"
\n",
"
17
\n",
"
Acte III
\n",
"
[Harpagon, Mariane, Frosine, Cléante, Brindavo...
\n",
"
Scène VIII
\n",
"
\n",
"
\n",
"
18
\n",
"
Acte III
\n",
"
[Harpagon, Mariane, Cléante, Élise, Frosine, L...
\n",
"
Scène IX
\n",
"
\n",
"
\n",
"
19
\n",
"
Acte IV
\n",
"
[Cléante, Mariane, Élise, Frosine]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
20
\n",
"
Acte IV
\n",
"
[Harpagon, Cléante, Mariane, Élise, Frosine]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
21
\n",
"
Acte IV
\n",
"
[Harpagon, Cléante]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
22
\n",
"
Acte IV
\n",
"
[Maître Jacques, Harpagon, Cléante]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
23
\n",
"
Acte IV
\n",
"
[Cléante, Harpagon]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
24
\n",
"
Acte IV
\n",
"
[La Flèche, Cléante]
\n",
"
Scène VI
\n",
"
\n",
"
\n",
"
25
\n",
"
Acte IV
\n",
"
[]
\n",
"
Scène VII
\n",
"
\n",
"
\n",
"
26
\n",
"
Acte V
\n",
"
[Harpagon, Le Commissaire, son Clerc]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
27
\n",
"
Acte V
\n",
"
[Maître Jacques, Harpagon, Le Commissaire, son...
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
28
\n",
"
Acte V
\n",
"
[Valère, Harpagon, le Commissaire, son Clerc, ...
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
29
\n",
"
Acte V
\n",
"
[Élise, Mariane, Frosine, Harpagon, Valère, Ma...
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
30
\n",
"
Acte V
\n",
"
[Anselme, Harpagon, Élise, Mariane, Frosine, V...
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
31
\n",
"
Acte V
\n",
"
[Cléante, Valère, Mariane, Élise, Frosine, Har...
\n",
"
Scène VI
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Acte Protagonistes \\\n",
"0 Acte Premier [Valère, Élise] \n",
"1 Acte Premier [Cléante, Élise] \n",
"2 Acte Premier [Harpagon, La Flèche] \n",
"3 Acte Premier [Élise, Cléante, Harpagon] \n",
"4 Acte Premier [Valère, Harpagon, Élise] \n",
"5 Acte II [Cléante, La Flèche] \n",
"6 Acte II [Maître Simon, Harpagon, Cléante, La Flèche] \n",
"7 Acte II [Frosine, Harpagon] \n",
"8 Acte II [La Flèche, Frosine] \n",
"9 Acte II [Harpagon, Frosine] \n",
"10 Acte III [Harpagon, Cléante, Élise, Valère, Dame Claude... \n",
"11 Acte III [Maître Jacques, Valère] \n",
"12 Acte III [Frosine, Mariane, Maître Jacques] \n",
"13 Acte III [Mariane, Frosine] \n",
"14 Acte III [Harpagon, Frosine, Mariane] \n",
"15 Acte III [Élise, Harpagon, Mariane, Frosine] \n",
"16 Acte III [Cléante, Harpagon, Élise, Mariane, Frosine] \n",
"17 Acte III [Harpagon, Mariane, Frosine, Cléante, Brindavo... \n",
"18 Acte III [Harpagon, Mariane, Cléante, Élise, Frosine, L... \n",
"19 Acte IV [Cléante, Mariane, Élise, Frosine] \n",
"20 Acte IV [Harpagon, Cléante, Mariane, Élise, Frosine] \n",
"21 Acte IV [Harpagon, Cléante] \n",
"22 Acte IV [Maître Jacques, Harpagon, Cléante] \n",
"23 Acte IV [Cléante, Harpagon] \n",
"24 Acte IV [La Flèche, Cléante] \n",
"25 Acte IV [] \n",
"26 Acte V [Harpagon, Le Commissaire, son Clerc] \n",
"27 Acte V [Maître Jacques, Harpagon, Le Commissaire, son... \n",
"28 Acte V [Valère, Harpagon, le Commissaire, son Clerc, ... \n",
"29 Acte V [Élise, Mariane, Frosine, Harpagon, Valère, Ma... \n",
"30 Acte V [Anselme, Harpagon, Élise, Mariane, Frosine, V... \n",
"31 Acte V [Cléante, Valère, Mariane, Élise, Frosine, Har... \n",
"\n",
" Scène \n",
"0 Scène Première \n",
"1 Scène II \n",
"2 Scène III \n",
"3 Scène IV \n",
"4 Scène V \n",
"5 Scène Première \n",
"6 Scène II \n",
"7 Scène III \n",
"8 Scène IV \n",
"9 Scène V \n",
"10 Scène Première \n",
"11 Scène II \n",
"12 Scène III \n",
"13 Scène IV \n",
"14 Scène V \n",
"15 Scène VI \n",
"16 Scène VII \n",
"17 Scène VIII \n",
"18 Scène IX \n",
"19 Scène Première \n",
"20 Scène II \n",
"21 Scène III \n",
"22 Scène IV \n",
"23 Scène V \n",
"24 Scène VI \n",
"25 Scène VII \n",
"26 Scène Première \n",
"27 Scène II \n",
"28 Scène III \n",
"29 Scène IV \n",
"30 Scène V \n",
"31 Scène VI "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def list_scene_protagonists():\n",
" rows = []\n",
"\n",
" for act in list_acts():\n",
" for scene in list_scenes(act=act[\"node\"]):\n",
" stage = scene[\"node\"].find(\"x:div[@class='stage stage']\", ns)\n",
"\n",
" # Si nous trouvons un noeud xpath pour cette requête, c'est un personnage\n",
" if stage is not None:\n",
" raw = \"\".join(stage.itertext()).strip()\n",
" people = [p.strip() for p in raw.split(\",\") if p.strip()]\n",
" else:\n",
" people = []\n",
"\n",
" rows.append({\n",
" \"Acte\": act[\"title\"],\n",
" \"Scène\": scene[\"title\"],\n",
" \"Protagonistes\": people,\n",
" })\n",
"\n",
" return pd.DataFrame(rows)\n",
"\n",
"df_scenes = list_scene_protagonists()\n",
"df_scenes"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"Nous voyons ici que la scène VII de l'acte IV ne contient aucun protagoniste déclaré dans la liste attenante[^1], mais nous allons de toute façon la compléter par l'extraction individuelle des interventions concrètes de chaque acteur.\n",
"\n",
"[^1]: Cette liste n'est pas obligatoire dans le contexte théâtral. Ici, on peut supposer que son absence est dûe à un monologue par exemple."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Acte
\n",
"
Intervenants
\n",
"
Scène
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Acte Premier
\n",
"
[Valère, Élise]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
1
\n",
"
Acte Premier
\n",
"
[Cléante, Élise]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
2
\n",
"
Acte Premier
\n",
"
[Harpagon, La Flèche]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
3
\n",
"
Acte Premier
\n",
"
[Harpagon, Cléante, Élise]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
4
\n",
"
Acte Premier
\n",
"
[Harpagon, Valère, Élise]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
5
\n",
"
Acte II
\n",
"
[Cléante, La Flèche]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
6
\n",
"
Acte II
\n",
"
[Maître simon, Harpagon, La Flèche, Cléante]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
7
\n",
"
Acte II
\n",
"
[Frosine, Harpagon]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
8
\n",
"
Acte II
\n",
"
[La Flèche, Frosine]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
9
\n",
"
Acte II
\n",
"
[Harpagon, Frosine]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
10
\n",
"
Acte III
\n",
"
[Harpagon, Maître Jacques, La Merluche, Brinda...
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
11
\n",
"
Acte III
\n",
"
[Valère, Maître Jacques]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
12
\n",
"
Acte III
\n",
"
[Frosine, Maître Jacques]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
13
\n",
"
Acte III
\n",
"
[Mariane, Frosine]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
14
\n",
"
Acte III
\n",
"
[Harpagon, Frosine]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
15
\n",
"
Acte III
\n",
"
[Mariane, Élise, Harpagon, Frosine]
\n",
"
Scène VI
\n",
"
\n",
"
\n",
"
16
\n",
"
Acte III
\n",
"
[Cléante, Mariane, Harpagon, Frosine, Valère]
\n",
"
Scène VII
\n",
"
\n",
"
\n",
"
17
\n",
"
Acte III
\n",
"
[Brindavoine, Harpagon]
\n",
"
Scène VIII
\n",
"
\n",
"
\n",
"
18
\n",
"
Acte III
\n",
"
[La Merluche, Harpagon, Cléante, Valère]
\n",
"
Scène IX
\n",
"
\n",
"
\n",
"
19
\n",
"
Acte IV
\n",
"
[Cléante, Élise, Mariane, Frosine]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
20
\n",
"
Acte IV
\n",
"
[Harpagon, Élise, Cléante]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
21
\n",
"
Acte IV
\n",
"
[Harpagon, Cléante]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
22
\n",
"
Acte IV
\n",
"
[Maître Jacques, Cléante, Harpagon]
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
23
\n",
"
Acte IV
\n",
"
[Cléante, Harpagon]
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
24
\n",
"
Acte IV
\n",
"
[La Flèche, Cléante]
\n",
"
Scène VI
\n",
"
\n",
"
\n",
"
25
\n",
"
Acte IV
\n",
"
[Harpagon]
\n",
"
Scène VII
\n",
"
\n",
"
\n",
"
26
\n",
"
Acte V
\n",
"
[Le Commissaire, Harpagon]
\n",
"
Scène Première
\n",
"
\n",
"
\n",
"
27
\n",
"
Acte V
\n",
"
[Maître Jacques, Harpagon, Le Commissaire]
\n",
"
Scène II
\n",
"
\n",
"
\n",
"
28
\n",
"
Acte V
\n",
"
[Harpagon, Valère, Maître Jacques]
\n",
"
Scène III
\n",
"
\n",
"
\n",
"
29
\n",
"
Acte V
\n",
"
[Harpagon, Valère, Élise, Maître Jacques, Fros...
\n",
"
Scène IV
\n",
"
\n",
"
\n",
"
30
\n",
"
Acte V
\n",
"
[Anselme, Harpagon, Valère, Mariane, Maître Ja...
\n",
"
Scène V
\n",
"
\n",
"
\n",
"
31
\n",
"
Acte V
\n",
"
[Cléante, Harpagon, Mariane, Anselme, Le Commi...
\n",
"
Scène VI
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Acte Intervenants \\\n",
"0 Acte Premier [Valère, Élise] \n",
"1 Acte Premier [Cléante, Élise] \n",
"2 Acte Premier [Harpagon, La Flèche] \n",
"3 Acte Premier [Harpagon, Cléante, Élise] \n",
"4 Acte Premier [Harpagon, Valère, Élise] \n",
"5 Acte II [Cléante, La Flèche] \n",
"6 Acte II [Maître simon, Harpagon, La Flèche, Cléante] \n",
"7 Acte II [Frosine, Harpagon] \n",
"8 Acte II [La Flèche, Frosine] \n",
"9 Acte II [Harpagon, Frosine] \n",
"10 Acte III [Harpagon, Maître Jacques, La Merluche, Brinda... \n",
"11 Acte III [Valère, Maître Jacques] \n",
"12 Acte III [Frosine, Maître Jacques] \n",
"13 Acte III [Mariane, Frosine] \n",
"14 Acte III [Harpagon, Frosine] \n",
"15 Acte III [Mariane, Élise, Harpagon, Frosine] \n",
"16 Acte III [Cléante, Mariane, Harpagon, Frosine, Valère] \n",
"17 Acte III [Brindavoine, Harpagon] \n",
"18 Acte III [La Merluche, Harpagon, Cléante, Valère] \n",
"19 Acte IV [Cléante, Élise, Mariane, Frosine] \n",
"20 Acte IV [Harpagon, Élise, Cléante] \n",
"21 Acte IV [Harpagon, Cléante] \n",
"22 Acte IV [Maître Jacques, Cléante, Harpagon] \n",
"23 Acte IV [Cléante, Harpagon] \n",
"24 Acte IV [La Flèche, Cléante] \n",
"25 Acte IV [Harpagon] \n",
"26 Acte V [Le Commissaire, Harpagon] \n",
"27 Acte V [Maître Jacques, Harpagon, Le Commissaire] \n",
"28 Acte V [Harpagon, Valère, Maître Jacques] \n",
"29 Acte V [Harpagon, Valère, Élise, Maître Jacques, Fros... \n",
"30 Acte V [Anselme, Harpagon, Valère, Mariane, Maître Ja... \n",
"31 Acte V [Cléante, Harpagon, Mariane, Anselme, Le Commi... \n",
"\n",
" Scène \n",
"0 Scène Première \n",
"1 Scène II \n",
"2 Scène III \n",
"3 Scène IV \n",
"4 Scène V \n",
"5 Scène Première \n",
"6 Scène II \n",
"7 Scène III \n",
"8 Scène IV \n",
"9 Scène V \n",
"10 Scène Première \n",
"11 Scène II \n",
"12 Scène III \n",
"13 Scène IV \n",
"14 Scène V \n",
"15 Scène VI \n",
"16 Scène VII \n",
"17 Scène VIII \n",
"18 Scène IX \n",
"19 Scène Première \n",
"20 Scène II \n",
"21 Scène III \n",
"22 Scène IV \n",
"23 Scène V \n",
"24 Scène VI \n",
"25 Scène VII \n",
"26 Scène Première \n",
"27 Scène II \n",
"28 Scène III \n",
"29 Scène IV \n",
"30 Scène V \n",
"31 Scène VI "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def list_scene_speakers():\n",
" rows = []\n",
"\n",
" for act in list_acts():\n",
" for scene in list_scenes(act=act[\"node\"]):\n",
" speakers, seen = [], set()\n",
"\n",
" for sp in scene[\"node\"].findall(\".//x:p[@class='speaker']\", ns):\n",
" name = speaker_name(sp)\n",
"\n",
" # On évite d'ajouter à la liste un acteur que l'on a déjà vu passer\n",
" if name and name not in seen:\n",
" seen.add(name)\n",
" speakers.append(name)\n",
"\n",
" rows.append({\n",
" \"Acte\": act[\"title\"],\n",
" \"Scène\": scene[\"title\"],\n",
" \"Intervenants\": speakers,\n",
" })\n",
"\n",
" return pd.DataFrame(rows)\n",
"\n",
"df_speakers = list_scene_speakers()\n",
"df_speakers"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Intervenant
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Anselme
\n",
"
\n",
"
\n",
"
1
\n",
"
Brindavoine
\n",
"
\n",
"
\n",
"
2
\n",
"
Cléante
\n",
"
\n",
"
\n",
"
3
\n",
"
Frosine
\n",
"
\n",
"
\n",
"
4
\n",
"
Harpagon
\n",
"
\n",
"
\n",
"
5
\n",
"
La Flèche
\n",
"
\n",
"
\n",
"
6
\n",
"
La Merluche
\n",
"
\n",
"
\n",
"
7
\n",
"
Le Commissaire
\n",
"
\n",
"
\n",
"
8
\n",
"
Mariane
\n",
"
\n",
"
\n",
"
9
\n",
"
Maître Jacques
\n",
"
\n",
"
\n",
"
10
\n",
"
Maître simon
\n",
"
\n",
"
\n",
"
11
\n",
"
Valère
\n",
"
\n",
"
\n",
"
12
\n",
"
Élise
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Intervenant\n",
"0 Anselme\n",
"1 Brindavoine\n",
"2 Cléante\n",
"3 Frosine\n",
"4 Harpagon\n",
"5 La Flèche\n",
"6 La Merluche\n",
"7 Le Commissaire\n",
"8 Mariane\n",
"9 Maître Jacques\n",
"10 Maître simon\n",
"11 Valère\n",
"12 Élise"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Tri localisé pour les intervenants et déduplication\n",
"intervenants_uniques = sorted(\n",
" {name for names in df_speakers[\"Intervenants\"] for name in names},\n",
" key=locale.strxfrm\n",
")\n",
"\n",
"intervenants_df = pd.DataFrame({\"Intervenant\": intervenants_uniques})\n",
"intervenants_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut désormais identifier les différences avec la _dramatis personae_, afin de vérifier l'uniformité des orthographes.\n",
"Par corollaire, on pourra, en même temps, identifier les acteurs sans réplique."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Personnage (en-tête)
\n",
"
Intervenant (répliques)
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Dame Claude
\n",
"
Le Commissaire
\n",
"
\n",
"
\n",
"
1
\n",
"
Le commissaire
\n",
"
Maître Jacques
\n",
"
\n",
"
\n",
"
2
\n",
"
Maitre Jacques
\n",
"
Maître simon
\n",
"
\n",
"
\n",
"
3
\n",
"
Maitre Simon
\n",
"
None
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Personnage (en-tête) Intervenant (répliques)\n",
"0 Dame Claude Le Commissaire\n",
"1 Le commissaire Maître Jacques\n",
"2 Maitre Jacques Maître simon\n",
"3 Maitre Simon None"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"acteurs_set = set(dramatis_personae[\"Personnage\"])\n",
"intervenants_set = set(intervenants_uniques)\n",
"\n",
"# On écarte les noms exactement identiques\n",
"communs = acteurs_set & intervenants_set\n",
"acteurs_only = sorted(acteurs_set - communs, key=locale.strxfrm)\n",
"intervenants_only = sorted(intervenants_set - communs, key=locale.strxfrm)\n",
"\n",
"df_diff = pd.DataFrame(\n",
" list(zip_longest(acteurs_only, intervenants_only)),\n",
" columns=[\"Personnage (en-tête)\", \"Intervenant (répliques)\"]\n",
")\n",
"\n",
"df_diff"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`None` indique simplement un remplissage par `zip_longest` pour que les deux listes aient la même taille."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On identifie bien deux orthographes différentes pour trois acteurs. La liste des personnages initiale omet les accents circonflexes de \"Maître\", \"commissaire\" est écrit en minuscule, et \"Simon\" a perdu sa majuscule."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Enfin, il est clair que Dame Claude n'a aucune réplique (puisqu'on ne la retrouve pas dans la liste des intervenants)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut donc créer une table de correspondance, associant un nom correctement orthographié avec les variantes que l'on peut trouver dans le texte initial. Nous utiliserons comme référence la graphie française correcte de \"maître\", et \"Commissaire\" avec une majuscule."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"alias_map = {\n",
" \"Maître Jacques\": {\"Maître Jacques\", \"Maitre Jacques\"},\n",
" \"Maître Simon\": {\"Maitre Simon\", \"Maître simon\"},\n",
" \"Le Commissaire\": {\"Le Commissaire\", \"Le commissaire\"},\n",
"}\n",
"\n",
"alias_index = {alias: canon for canon, aliases in alias_map.items() for alias in aliases}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quantité de parole par acteur\n",
"\n",
"Maintenant que nous disposons d'une liste uniformisée des noms des différents acteurs, nous pouvons analyser l'ensemble de la pièce et quantifier le texte prononcé par chaque acteur."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Acte
\n",
"
Scène
\n",
"
Personnage
\n",
"
Mots
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Acte II
\n",
"
Scène II
\n",
"
Cléante
\n",
"
127
\n",
"
\n",
"
\n",
"
1
\n",
"
Acte II
\n",
"
Scène II
\n",
"
Harpagon
\n",
"
171
\n",
"
\n",
"
\n",
"
2
\n",
"
Acte II
\n",
"
Scène II
\n",
"
La Flèche
\n",
"
12
\n",
"
\n",
"
\n",
"
3
\n",
"
Acte II
\n",
"
Scène II
\n",
"
Maître Simon
\n",
"
197
\n",
"
\n",
"
\n",
"
4
\n",
"
Acte II
\n",
"
Scène III
\n",
"
Frosine
\n",
"
1
\n",
"
\n",
"
\n",
"
5
\n",
"
Acte II
\n",
"
Scène III
\n",
"
Harpagon
\n",
"
21
\n",
"
\n",
"
\n",
"
6
\n",
"
Acte II
\n",
"
Scène IV
\n",
"
Frosine
\n",
"
130
\n",
"
\n",
"
\n",
"
7
\n",
"
Acte II
\n",
"
Scène IV
\n",
"
La Flèche
\n",
"
292
\n",
"
\n",
"
\n",
"
8
\n",
"
Acte II
\n",
"
Scène Première
\n",
"
Cléante
\n",
"
379
\n",
"
\n",
"
\n",
"
9
\n",
"
Acte II
\n",
"
Scène Première
\n",
"
La Flèche
\n",
"
903
\n",
"
\n",
"
\n",
"
10
\n",
"
Acte II
\n",
"
Scène V
\n",
"
Frosine
\n",
"
1482
\n",
"
\n",
"
\n",
"
11
\n",
"
Acte II
\n",
"
Scène V
\n",
"
Harpagon
\n",
"
555
\n",
"
\n",
"
\n",
"
12
\n",
"
Acte III
\n",
"
Scène II
\n",
"
Maître Jacques
\n",
"
186
\n",
"
\n",
"
\n",
"
13
\n",
"
Acte III
\n",
"
Scène II
\n",
"
Valère
\n",
"
92
\n",
"
\n",
"
\n",
"
14
\n",
"
Acte III
\n",
"
Scène III
\n",
"
Frosine
\n",
"
19
\n",
"
\n",
"
\n",
"
15
\n",
"
Acte III
\n",
"
Scène III
\n",
"
Maître Jacques
\n",
"
11
\n",
"
\n",
"
\n",
"
16
\n",
"
Acte III
\n",
"
Scène IV
\n",
"
Frosine
\n",
"
191
\n",
"
\n",
"
\n",
"
17
\n",
"
Acte III
\n",
"
Scène IV
\n",
"
Mariane
\n",
"
185
\n",
"
\n",
"
\n",
"
18
\n",
"
Acte III
\n",
"
Scène IX
\n",
"
Cléante
\n",
"
40
\n",
"
\n",
"
\n",
"
19
\n",
"
Acte III
\n",
"
Scène IX
\n",
"
Harpagon
\n",
"
73
\n",
"
\n",
"
\n",
"
20
\n",
"
Acte III
\n",
"
Scène IX
\n",
"
La Merluche
\n",
"
21
\n",
"
\n",
"
\n",
"
21
\n",
"
Acte III
\n",
"
Scène IX
\n",
"
Valère
\n",
"
7
\n",
"
\n",
"
\n",
"
22
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
Brindavoine
\n",
"
23
\n",
"
\n",
"
\n",
"
23
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
Cléante
\n",
"
76
\n",
"
\n",
"
\n",
"
24
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
Harpagon
\n",
"
747
\n",
"
\n",
"
\n",
"
25
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
La Merluche
\n",
"
26
\n",
"
\n",
"
\n",
"
26
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
Maître Jacques
\n",
"
779
\n",
"
\n",
"
\n",
"
27
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
Valère
\n",
"
249
\n",
"
\n",
"
\n",
"
28
\n",
"
Acte III
\n",
"
Scène Première
\n",
"
Élise
\n",
"
3
\n",
"
\n",
"
\n",
"
29
\n",
"
Acte III
\n",
"
Scène V
\n",
"
Frosine
\n",
"
26
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
65
\n",
"
Acte Premier
\n",
"
Scène IV
\n",
"
Élise
\n",
"
162
\n",
"
\n",
"
\n",
"
66
\n",
"
Acte Premier
\n",
"
Scène Première
\n",
"
Valère
\n",
"
630
\n",
"
\n",
"
\n",
"
67
\n",
"
Acte Premier
\n",
"
Scène Première
\n",
"
Élise
\n",
"
491
\n",
"
\n",
"
\n",
"
68
\n",
"
Acte Premier
\n",
"
Scène V
\n",
"
Harpagon
\n",
"
271
\n",
"
\n",
"
\n",
"
69
\n",
"
Acte Premier
\n",
"
Scène V
\n",
"
Valère
\n",
"
701
\n",
"
\n",
"
\n",
"
70
\n",
"
Acte Premier
\n",
"
Scène V
\n",
"
Élise
\n",
"
36
\n",
"
\n",
"
\n",
"
71
\n",
"
Acte V
\n",
"
Scène II
\n",
"
Harpagon
\n",
"
182
\n",
"
\n",
"
\n",
"
72
\n",
"
Acte V
\n",
"
Scène II
\n",
"
Le Commissaire
\n",
"
159
\n",
"
\n",
"
\n",
"
73
\n",
"
Acte V
\n",
"
Scène II
\n",
"
Maître Jacques
\n",
"
348
\n",
"
\n",
"
\n",
"
74
\n",
"
Acte V
\n",
"
Scène III
\n",
"
Harpagon
\n",
"
441
\n",
"
\n",
"
\n",
"
75
\n",
"
Acte V
\n",
"
Scène III
\n",
"
Maître Jacques
\n",
"
11
\n",
"
\n",
"
\n",
"
76
\n",
"
Acte V
\n",
"
Scène III
\n",
"
Valère
\n",
"
641
\n",
"
\n",
"
\n",
"
77
\n",
"
Acte V
\n",
"
Scène IV
\n",
"
Frosine
\n",
"
4
\n",
"
\n",
"
\n",
"
78
\n",
"
Acte V
\n",
"
Scène IV
\n",
"
Harpagon
\n",
"
124
\n",
"
\n",
"
\n",
"
79
\n",
"
Acte V
\n",
"
Scène IV
\n",
"
Maître Jacques
\n",
"
7
\n",
"
\n",
"
\n",
"
80
\n",
"
Acte V
\n",
"
Scène IV
\n",
"
Valère
\n",
"
22
\n",
"
\n",
"
\n",
"
81
\n",
"
Acte V
\n",
"
Scène IV
\n",
"
Élise
\n",
"
143
\n",
"
\n",
"
\n",
"
82
\n",
"
Acte V
\n",
"
Scène Première
\n",
"
Harpagon
\n",
"
89
\n",
"
\n",
"
\n",
"
83
\n",
"
Acte V
\n",
"
Scène Première
\n",
"
Le Commissaire
\n",
"
109
\n",
"
\n",
"
\n",
"
84
\n",
"
Acte V
\n",
"
Scène V
\n",
"
Anselme
\n",
"
403
\n",
"
\n",
"
\n",
"
85
\n",
"
Acte V
\n",
"
Scène V
\n",
"
Harpagon
\n",
"
258
\n",
"
\n",
"
\n",
"
86
\n",
"
Acte V
\n",
"
Scène V
\n",
"
Mariane
\n",
"
192
\n",
"
\n",
"
\n",
"
87
\n",
"
Acte V
\n",
"
Scène V
\n",
"
Maître Jacques
\n",
"
7
\n",
"
\n",
"
\n",
"
88
\n",
"
Acte V
\n",
"
Scène V
\n",
"
Valère
\n",
"
376
\n",
"
\n",
"
\n",
"
89
\n",
"
Acte V
\n",
"
Scène VI
\n",
"
Anselme
\n",
"
114
\n",
"
\n",
"
\n",
"
90
\n",
"
Acte V
\n",
"
Scène VI
\n",
"
Cléante
\n",
"
130
\n",
"
\n",
"
\n",
"
91
\n",
"
Acte V
\n",
"
Scène VI
\n",
"
Harpagon
\n",
"
89
\n",
"
\n",
"
\n",
"
92
\n",
"
Acte V
\n",
"
Scène VI
\n",
"
Le Commissaire
\n",
"
26
\n",
"
\n",
"
\n",
"
93
\n",
"
Acte V
\n",
"
Scène VI
\n",
"
Mariane
\n",
"
36
\n",
"
\n",
"
\n",
"
94
\n",
"
Acte V
\n",
"
Scène VI
\n",
"
Maître Jacques
\n",
"
23
\n",
"
\n",
" \n",
"
\n",
"
95 rows × 4 columns
\n",
"
"
],
"text/plain": [
" Acte Scène Personnage Mots\n",
"0 Acte II Scène II Cléante 127\n",
"1 Acte II Scène II Harpagon 171\n",
"2 Acte II Scène II La Flèche 12\n",
"3 Acte II Scène II Maître Simon 197\n",
"4 Acte II Scène III Frosine 1\n",
"5 Acte II Scène III Harpagon 21\n",
"6 Acte II Scène IV Frosine 130\n",
"7 Acte II Scène IV La Flèche 292\n",
"8 Acte II Scène Première Cléante 379\n",
"9 Acte II Scène Première La Flèche 903\n",
"10 Acte II Scène V Frosine 1482\n",
"11 Acte II Scène V Harpagon 555\n",
"12 Acte III Scène II Maître Jacques 186\n",
"13 Acte III Scène II Valère 92\n",
"14 Acte III Scène III Frosine 19\n",
"15 Acte III Scène III Maître Jacques 11\n",
"16 Acte III Scène IV Frosine 191\n",
"17 Acte III Scène IV Mariane 185\n",
"18 Acte III Scène IX Cléante 40\n",
"19 Acte III Scène IX Harpagon 73\n",
"20 Acte III Scène IX La Merluche 21\n",
"21 Acte III Scène IX Valère 7\n",
"22 Acte III Scène Première Brindavoine 23\n",
"23 Acte III Scène Première Cléante 76\n",
"24 Acte III Scène Première Harpagon 747\n",
"25 Acte III Scène Première La Merluche 26\n",
"26 Acte III Scène Première Maître Jacques 779\n",
"27 Acte III Scène Première Valère 249\n",
"28 Acte III Scène Première Élise 3\n",
"29 Acte III Scène V Frosine 26\n",
".. ... ... ... ...\n",
"65 Acte Premier Scène IV Élise 162\n",
"66 Acte Premier Scène Première Valère 630\n",
"67 Acte Premier Scène Première Élise 491\n",
"68 Acte Premier Scène V Harpagon 271\n",
"69 Acte Premier Scène V Valère 701\n",
"70 Acte Premier Scène V Élise 36\n",
"71 Acte V Scène II Harpagon 182\n",
"72 Acte V Scène II Le Commissaire 159\n",
"73 Acte V Scène II Maître Jacques 348\n",
"74 Acte V Scène III Harpagon 441\n",
"75 Acte V Scène III Maître Jacques 11\n",
"76 Acte V Scène III Valère 641\n",
"77 Acte V Scène IV Frosine 4\n",
"78 Acte V Scène IV Harpagon 124\n",
"79 Acte V Scène IV Maître Jacques 7\n",
"80 Acte V Scène IV Valère 22\n",
"81 Acte V Scène IV Élise 143\n",
"82 Acte V Scène Première Harpagon 89\n",
"83 Acte V Scène Première Le Commissaire 109\n",
"84 Acte V Scène V Anselme 403\n",
"85 Acte V Scène V Harpagon 258\n",
"86 Acte V Scène V Mariane 192\n",
"87 Acte V Scène V Maître Jacques 7\n",
"88 Acte V Scène V Valère 376\n",
"89 Acte V Scène VI Anselme 114\n",
"90 Acte V Scène VI Cléante 130\n",
"91 Acte V Scène VI Harpagon 89\n",
"92 Acte V Scène VI Le Commissaire 26\n",
"93 Acte V Scène VI Mariane 36\n",
"94 Acte V Scène VI Maître Jacques 23\n",
"\n",
"[95 rows x 4 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TODO : Est-ce que cette fonction est redondante avec\n",
"# les fonctions utilitaires créées précédemment ?\n",
"def count_words_by_actor():\n",
" rows = []\n",
"\n",
" for act in list_acts():\n",
" for scene in list_scenes(act=act[\"node\"]):\n",
" for order, speech in enumerate(scene_speeches(scene=scene[\"node\"])):\n",
" txt = speech[\"text\"]\n",
" rows.append({\n",
" \"Acte\": act[\"title\"],\n",
" \"Scène\": scene[\"title\"],\n",
" \"Ordre\": order,\n",
" \"Personnage\": speech[\"speaker\"],\n",
" \"Texte\": txt,\n",
" \"Mots\": speech[\"word_count\"],\n",
" \"Lignes\": line_count(txt),\n",
" })\n",
"\n",
" return pd.DataFrame(rows)\n",
"\n",
"df_speeches = count_words_by_actor()\n",
"df_counts = df_speeches.groupby([\"Acte\", \"Scène\", \"Personnage\"], as_index=False)[\"Mots\"].sum()\n",
"df_counts"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"Le comptage semble s'effectuer correctement, mais un tel tableau n'est pas digeste. \n",
"On peut noter par exemple que \"Acte Premier\" est dilué au centre du tableau, en raison de la clause `groupby`, qui trie implicitement le tableau, et ignore donc notre tri initial.\n",
"Nous pouvons toutefois ignorer ce détail ici, et regrouper par personnage : de cette manière, nous aurons un aperçu global du temps de parole de chacun à travers l'oeuvre."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Personnage le plus taciturne : Brindavoine (43 mots)\n",
"Personnage le plus locace : Harpagon (6132 mots)\n"
]
}
],
"source": [
"global_df = df_speeches.groupby([\"Personnage\"])[\"Mots\"].sum()\n",
"\n",
"moins_bavard_nom = global_df.idxmin()\n",
"moins_bavard_mots = global_df.min()\n",
"\n",
"plus_bavard_nom = global_df.idxmax()\n",
"plus_bavard_mots = global_df.max()\n",
"\n",
"print(f\"Personnage le plus taciturne : {moins_bavard_nom} ({moins_bavard_mots} mots)\")\n",
"print(f\"Personnage le plus locace : {plus_bavard_nom} ({plus_bavard_mots} mots)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notons que nous avons déjà établi que Dame Claude n'avait aucune réplique, et bien que le Commissaire soit accompagné d'un clerc, ce dernier ne parle jamais non plus."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Montrons la proportion de dialogues par personnage à travers un diagramme circulaire :"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"n_small = 5\n",
"y0, dy = 1.3, 0.05 # Positionnement du bloc des acteurs les moins locaces\n",
"x_col = -0.5\n",
"\n",
"totaux = df_speeches.groupby(\"Personnage\")[\"Mots\"].sum().sort_values(ascending=False)\n",
"labels = totaux.index\n",
"values = totaux.values\n",
"color_map = create_actors_colormap(labels)\n",
"colors = [color_map[p] for p in labels]\n",
"\n",
"fig, ax = plt.subplots(figsize=(7, 7))\n",
"wedges, _ = ax.pie(values, colors=colors, startangle=90, counterclock=False)\n",
"\n",
"small_roles = global_df.sort_values().head(n_small) # les moins bavards, ordre croissant\n",
"\n",
"# Placement de l'étiquette pour les petits rôles\n",
"for i, (name, val) in enumerate(small_roles.items()):\n",
" idx = labels.get_loc(name)\n",
" w = wedges[idx]\n",
"\n",
" theta = np.deg2rad((w.theta1 + w.theta2) / 2)\n",
" xw, yw = np.cos(theta), np.sin(theta)\n",
" xpos, ypos = x_col, y0 - i * dy\n",
"\n",
" ax.annotate(\n",
" f\"{name} ({val})\",\n",
" xy=(xw, yw), xytext=(xpos, ypos),\n",
" ha=\"right\", va=\"center\", fontsize=9,\n",
" arrowprops=dict(\n",
" arrowstyle=\"-\",\n",
" color=colors[idx],\n",
" lw=1,\n",
" connectionstyle=\"angle,angleA=0,angleB=90\",\n",
" shrinkA=0, shrinkB=0,\n",
" ),\n",
" )\n",
"\n",
"# Autres rôles : nom autour + valeur au centre\n",
"for idx, name in enumerate(labels):\n",
" if name in small_roles.index:\n",
" continue\n",
"\n",
" w = wedges[idx]\n",
" theta = np.deg2rad((w.theta1 + w.theta2) / 2)\n",
" r_label, r_value = 1.1, 0.7\n",
"\n",
" ax.text(r_label * np.cos(theta), r_label * np.sin(theta), name, ha=\"center\", va=\"center\", fontsize=9)\n",
" ax.text(r_value * np.cos(theta), r_value * np.sin(theta), str(totaux[name]), ha=\"center\", va=\"center\", fontsize=9, color=\"black\")\n",
"\n",
"ax.set_title(\"Répartition des mots prononcés par personnage\", pad=50)\n",
"ax.axis(\"equal\")\n",
"plt.tight_layout()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Statistiques complémentaires\n",
"\n",
"Inspirées des tableaux de l'OBVIL, nous examinons la place de chaque personnage et les relations directes entre interlocuteurs (une ligne = 60 caractères).\n",
"Commençons par la \"Table des rôles\" :"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Rôle
\n",
"
Scènes
\n",
"
Répl.
\n",
"
Répl. moy.
\n",
"
Présence
\n",
"
Texte
\n",
"
Texte % prés.
\n",
"
Texte × pers.
\n",
"
Interlocution
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
[TOUS]
\n",
"
32 sc.
\n",
"
959 répl.
\n",
"
1,8 l.
\n",
"
1 769 l. (100 %)
\n",
"
1 769 l. (100 %)
\n",
"
100 %
\n",
"
5 847 l. (100 %)
\n",
"
3,3 pers.
\n",
"
\n",
"
\n",
"
1
\n",
"
Harpagon
\n",
"
23 sc.
\n",
"
354 répl.
\n",
"
1,5 l.
\n",
"
1 296 l. (73 %)
\n",
"
514 l. (29 %)
\n",
"
40 %
\n",
"
4 729 l. (81 %)
\n",
"
3,7 pers.
\n",
"
\n",
"
\n",
"
2
\n",
"
Cléante
\n",
"
14 sc.
\n",
"
161 répl.
\n",
"
1,8 l.
\n",
"
900 l. (51 %)
\n",
"
285 l. (16 %)
\n",
"
32 %
\n",
"
3 486 l. (60 %)
\n",
"
3,9 pers.
\n",
"
\n",
"
\n",
"
3
\n",
"
Élise
\n",
"
9 sc.
\n",
"
51 répl.
\n",
"
1,8 l.
\n",
"
681 l. (39 %)
\n",
"
92 l. (5 %)
\n",
"
13 %
\n",
"
2 667 l. (46 %)
\n",
"
3,9 pers.
\n",
"
\n",
"
\n",
"
4
\n",
"
Valère
\n",
"
9 sc.
\n",
"
101 répl.
\n",
"
2,3 l.
\n",
"
695 l. (39 %)
\n",
"
232 l. (13 %)
\n",
"
33 %
\n",
"
3 067 l. (52 %)
\n",
"
4,4 pers.
\n",
"
\n",
"
\n",
"
5
\n",
"
Mariane
\n",
"
6 sc.
\n",
"
31 répl.
\n",
"
2,5 l.
\n",
"
359 l. (20 %)
\n",
"
79 l. (4 %)
\n",
"
22 %
\n",
"
1 638 l. (28 %)
\n",
"
4,6 pers.
\n",
"
\n",
"
\n",
"
6
\n",
"
Anselme
\n",
"
2 sc.
\n",
"
20 répl.
\n",
"
2,3 l.
\n",
"
143 l. (8 %)
\n",
"
45 l. (3 %)
\n",
"
32 %
\n",
"
749 l. (13 %)
\n",
"
5,3 pers.
\n",
"
\n",
"
\n",
"
7
\n",
"
Frosine
\n",
"
10 sc.
\n",
"
60 répl.
\n",
"
3,3 l.
\n",
"
466 l. (26 %)
\n",
"
201 l. (11 %)
\n",
"
43 %
\n",
"
1 465 l. (25 %)
\n",
"
3,1 pers.
\n",
"
\n",
"
\n",
"
8
\n",
"
Maître Simon
\n",
"
1 sc.
\n",
"
5 répl.
\n",
"
3,2 l.
\n",
"
44 l. (2 %)
\n",
"
16 l. (1 %)
\n",
"
37 %
\n",
"
175 l. (3 %)
\n",
"
4,0 pers.
\n",
"
\n",
"
\n",
"
9
\n",
"
Maître Jacques
\n",
"
9 sc.
\n",
"
85 répl.
\n",
"
1,6 l.
\n",
"
557 l. (32 %)
\n",
"
140 l. (8 %)
\n",
"
25 %
\n",
"
2 670 l. (46 %)
\n",
"
4,8 pers.
\n",
"
\n",
"
\n",
"
10
\n",
"
La Flèche
\n",
"
5 sc.
\n",
"
66 répl.
\n",
"
2,0 l.
\n",
"
255 l. (14 %)
\n",
"
132 l. (7 %)
\n",
"
52 %
\n",
"
598 l. (10 %)
\n",
"
2,3 pers.
\n",
"
\n",
"
\n",
"
11
\n",
"
Dame Claude
\n",
"
0 sc.
\n",
"
0 répl.
\n",
"
0,0 l.
\n",
"
0 l.
\n",
"
0 l.
\n",
"
0 %
\n",
"
0 l. (0 %)
\n",
"
0,0 pers.
\n",
"
\n",
"
\n",
"
12
\n",
"
Brindavoine
\n",
"
2 sc.
\n",
"
3 répl.
\n",
"
1,1 l.
\n",
"
166 l. (9 %)
\n",
"
3 l. (0 %)
\n",
"
2 %
\n",
"
1 146 l. (20 %)
\n",
"
6,9 pers.
\n",
"
\n",
"
\n",
"
13
\n",
"
La Merluche
\n",
"
2 sc.
\n",
"
5 répl.
\n",
"
0,9 l.
\n",
"
175 l. (10 %)
\n",
"
5 l. (0 %)
\n",
"
3 %
\n",
"
1 189 l. (20 %)
\n",
"
6,8 pers.
\n",
"
\n",
"
\n",
"
14
\n",
"
Le Commissaire
\n",
"
3 sc.
\n",
"
17 répl.
\n",
"
1,5 l.
\n",
"
110 l. (6 %)
\n",
"
26 l. (1 %)
\n",
"
24 %
\n",
"
418 l. (7 %)
\n",
"
3,8 pers.
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Rôle Scènes Répl. Répl. moy. Présence \\\n",
"0 [TOUS] 32 sc. 959 répl. 1,8 l. 1 769 l. (100 %) \n",
"1 Harpagon 23 sc. 354 répl. 1,5 l. 1 296 l. (73 %) \n",
"2 Cléante 14 sc. 161 répl. 1,8 l. 900 l. (51 %) \n",
"3 Élise 9 sc. 51 répl. 1,8 l. 681 l. (39 %) \n",
"4 Valère 9 sc. 101 répl. 2,3 l. 695 l. (39 %) \n",
"5 Mariane 6 sc. 31 répl. 2,5 l. 359 l. (20 %) \n",
"6 Anselme 2 sc. 20 répl. 2,3 l. 143 l. (8 %) \n",
"7 Frosine 10 sc. 60 répl. 3,3 l. 466 l. (26 %) \n",
"8 Maître Simon 1 sc. 5 répl. 3,2 l. 44 l. (2 %) \n",
"9 Maître Jacques 9 sc. 85 répl. 1,6 l. 557 l. (32 %) \n",
"10 La Flèche 5 sc. 66 répl. 2,0 l. 255 l. (14 %) \n",
"11 Dame Claude 0 sc. 0 répl. 0,0 l. 0 l. \n",
"12 Brindavoine 2 sc. 3 répl. 1,1 l. 166 l. (9 %) \n",
"13 La Merluche 2 sc. 5 répl. 0,9 l. 175 l. (10 %) \n",
"14 Le Commissaire 3 sc. 17 répl. 1,5 l. 110 l. (6 %) \n",
"\n",
" Texte Texte % prés. Texte × pers. Interlocution \n",
"0 1 769 l. (100 %) 100 % 5 847 l. (100 %) 3,3 pers. \n",
"1 514 l. (29 %) 40 % 4 729 l. (81 %) 3,7 pers. \n",
"2 285 l. (16 %) 32 % 3 486 l. (60 %) 3,9 pers. \n",
"3 92 l. (5 %) 13 % 2 667 l. (46 %) 3,9 pers. \n",
"4 232 l. (13 %) 33 % 3 067 l. (52 %) 4,4 pers. \n",
"5 79 l. (4 %) 22 % 1 638 l. (28 %) 4,6 pers. \n",
"6 45 l. (3 %) 32 % 749 l. (13 %) 5,3 pers. \n",
"7 201 l. (11 %) 43 % 1 465 l. (25 %) 3,1 pers. \n",
"8 16 l. (1 %) 37 % 175 l. (3 %) 4,0 pers. \n",
"9 140 l. (8 %) 25 % 2 670 l. (46 %) 4,8 pers. \n",
"10 132 l. (7 %) 52 % 598 l. (10 %) 2,3 pers. \n",
"11 0 l. 0 % 0 l. (0 %) 0,0 pers. \n",
"12 3 l. (0 %) 2 % 1 146 l. (20 %) 6,9 pers. \n",
"13 5 l. (0 %) 3 % 1 189 l. (20 %) 6,8 pers. \n",
"14 26 l. (1 %) 24 % 418 l. (7 %) 3,8 pers. "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Agrégats au niveau des scènes\n",
"scene_totals = (\n",
" df_speeches\n",
" .groupby([\"Acte\", \"Scène\"], as_index=False)\n",
" .agg({\n",
" \"Lignes\": \"sum\",\n",
" \"Personnage\": \"nunique\",\n",
" })\n",
")\n",
"\n",
"scene_totals = scene_totals.rename(columns={\n",
" \"Lignes\": \"scene_lines\",\n",
" \"Personnage\": \"participants\",\n",
"})\n",
"\n",
"scene_totals[\"textexpers\"] = scene_totals[\"scene_lines\"] * scene_totals[\"participants\"]\n",
"scene_totals[\"SceneKey\"] = scene_totals[\"Acte\"] + \" | \" + scene_totals[\"Scène\"]\n",
"\n",
"total_lines_play = scene_totals[\"scene_lines\"].sum()\n",
"textexpers_total = scene_totals[\"textexpers\"].sum()\n",
"\n",
"# Statistiques par personnage\n",
"speech_stats = (\n",
" df_speeches\n",
" .groupby(\"Personnage\")\n",
" .agg({\n",
" \"Texte\": \"count\",\n",
" \"Lignes\": \"sum\",\n",
" })\n",
" .rename(columns={\n",
" \"Texte\": \"repl\",\n",
" \"Lignes\": \"text_lines\",\n",
" })\n",
")\n",
"\n",
"presence = (\n",
" df_speeches[[\"Acte\", \"Scène\", \"Personnage\"]]\n",
" .drop_duplicates()\n",
" .merge(\n",
" scene_totals[[\"Acte\", \"Scène\", \"scene_lines\", \"textexpers\"]],\n",
" on=[\"Acte\", \"Scène\"],\n",
" how=\"left\",\n",
" )\n",
")\n",
"\n",
"presence_stats = (\n",
" presence\n",
" .groupby(\"Personnage\")\n",
" .agg({\n",
" \"Scène\": \"count\",\n",
" \"scene_lines\": \"sum\",\n",
" \"textexpers\": \"sum\",\n",
" })\n",
" .rename(columns={\n",
" \"Scène\": \"scenes\",\n",
" \"scene_lines\": \"presence_lines\",\n",
" })\n",
")\n",
"\n",
"roles = speech_stats.join(presence_stats, how=\"outer\").fillna(0)\n",
"\n",
"# Ajout des rôles muets (ex : Dame Claude) pour qu'ils apparaissent dans le tableau\n",
"for name in (resolve_name(p) for p in dramatis_personae[\"Personnage\"]):\n",
" if name not in roles.index:\n",
" roles.loc[name] = 0\n",
"\n",
"roles[\"repl_moy\"] = roles[\"text_lines\"] / roles[\"repl\"]\n",
"roles[\"presence_pct\"] = roles[\"presence_lines\"] / total_lines_play\n",
"roles[\"text_pct\"] = roles[\"text_lines\"] / total_lines_play\n",
"roles[\"text_presence_pct\"] = roles[\"text_lines\"] / roles[\"presence_lines\"]\n",
"roles[\"textexpers_pct\"] = roles[\"textexpers\"] / textexpers_total\n",
"roles[\"interlocution\"] = roles[\"textexpers\"] / roles[\"presence_lines\"]\n",
"\n",
"roles = roles.replace([np.inf, -np.inf], 0).fillna(0)\n",
"\n",
"# Ordre basé sur la distribution initiale\n",
"role_order = [\n",
" name for name in (resolve_name(p) for p in dramatis_personae[\"Personnage\"])\n",
" if name in roles.index\n",
"]\n",
"role_order += [r for r in roles.index if r not in role_order]\n",
"\n",
"roles = roles.loc[role_order]\n",
"\n",
"# Ligne globale\n",
"all_row = pd.Series({\n",
" \"scenes\": scene_totals.shape[0],\n",
" \"repl\": len(df_speeches),\n",
" \"repl_moy\": df_speeches[\"Lignes\"].sum() / len(df_speeches),\n",
" \"presence_lines\": total_lines_play,\n",
" \"presence_pct\": 1.0,\n",
" \"text_lines\": df_speeches[\"Lignes\"].sum(),\n",
" \"text_pct\": 1.0,\n",
" \"text_presence_pct\": (\n",
" df_speeches[\"Lignes\"].sum() / total_lines_play if total_lines_play else 0\n",
" ),\n",
" \"textexpers\": textexpers_total,\n",
" \"textexpers_pct\": 1.0,\n",
" \"interlocution\": textexpers_total / total_lines_play if total_lines_play else 0,\n",
"})\n",
"\n",
"roles = pd.concat([\n",
" pd.DataFrame({\"Personnage\": [\"[TOUS]\"]})\n",
" .set_index(\"Personnage\")\n",
" .assign(**all_row),\n",
" roles,\n",
"])\n",
"\n",
"roles.index.name = \"Rôle\"\n",
"\n",
"# Quelques utilitaires spécifiques\n",
"\n",
"def format_number(value, decimals=0):\n",
" fmt = f\"{value:,.{decimals}f}\"\n",
" return fmt.replace(\",\", \" \").replace(\".\", \",\")\n",
"\n",
"def format_lines(value, decimals=0):\n",
" return f\"{format_number(round(value, decimals), decimals)} l.\"\n",
"\n",
"def format_percent(value, decimals=0):\n",
" return f\"{format_number(value * 100, decimals)} %\"\n",
"\n",
"def format_people(value):\n",
" return f\"{format_number(value, 1)} pers.\"\n",
"\n",
"roles_display = roles.reset_index()\n",
"roles_display[\"Scènes\"] = roles_display[\"scenes\"].fillna(0).astype(int).astype(str) + \" sc.\"\n",
"roles_display[\"Répl.\"] = roles_display[\"repl\"].fillna(0).astype(int).astype(str) + \" répl.\"\n",
"roles_display[\"Répl. moy.\"] = roles_display[\"repl_moy\"].apply(lambda v: format_lines(v, 1))\n",
"roles_display[\"Présence\"] = roles_display.apply(\n",
" lambda r: f\"{format_lines(r['presence_lines'])}\"\n",
" + (f\" ({format_percent(r['presence_pct'])})\" if r[\"presence_lines\"] else \"\"),\n",
" axis=1,\n",
")\n",
"roles_display[\"Texte\"] = roles_display.apply(\n",
" lambda r: f\"{format_lines(r['text_lines'])}\"\n",
" + (f\" ({format_percent(r['text_pct'])})\" if r[\"text_lines\"] else \"\"),\n",
" axis=1,\n",
")\n",
"roles_display[\"Texte % prés.\"] = roles_display[\"text_presence_pct\"].apply(lambda v: format_percent(v, 0))\n",
"roles_display[\"Texte × pers.\"] = roles_display.apply(\n",
" lambda r: f\"{format_lines(r['textexpers'])} ({format_percent(r['textexpers_pct'])})\",\n",
" axis=1,\n",
")\n",
"roles_display[\"Interlocution\"] = roles_display[\"interlocution\"].apply(format_people)\n",
"\n",
"roles_table = roles_display[\n",
" [\"Rôle\", \"Scènes\", \"Répl.\", \"Répl. moy.\", \"Présence\",\n",
" \"Texte\", \"Texte % prés.\", \"Texte × pers.\", \"Interlocution\"]\n",
"]\n",
"\n",
"roles_table\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Il existe des différences entre le tableau que nous avons généré et celui de l'OBVIL.\n",
"Ces différences peuvent être attribuées à :\n",
"\n",
"- une méthode de nettoyage des lignes différente (nous avons opté pour un nettoyage agressif des espaces surnuméraires)\n",
"- une gestion des décimales différentes (est-ce que l'OBVIL arrondi à l'entier supérieur ou inférieur, ou tronque les décimales ?)\n",
"\n",
"Ces différences affectent mathématiquement les statistiques qui découlent de ce comptage, notamment l'interlocution."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Statistiques par relation\n",
"\n",
"Chaque relation s'appuie sur l'enchaînement de répliques adjacentes entre deux personnages (monologues inclus), ce qui reflète les échanges directs plutôt que la simple coprésence sur scène.\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Détail
\n",
"
Interlocution
\n",
"
Relation
\n",
"
Scènes
\n",
"
Texte
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
33 l. (100 %) 1 répl. 32,7 l.
\n",
"
1,0 pers.
\n",
"
Harpagon
\n",
"
1 sc.
\n",
"
33 l. (2 %)
\n",
"
\n",
"
\n",
"
1
\n",
"
135 l. (49 %) 97 répl. 1,4 l. - 140 l. (51 %) ...
\n",
"
4,6 pers.
\n",
"
Cléante / Harpagon
\n",
"
9 sc.
\n",
"
275 l. (16 %)
\n",
"
\n",
"
\n",
"
2
\n",
"
41 l. (60 %) 28 répl. 1,5 l. - 28 l. (40 %) 28...
\n",
"
4,7 pers.
\n",
"
Harpagon / Élise
\n",
"
6 sc.
\n",
"
69 l. (4 %)
\n",
"
\n",
"
\n",
"
3
\n",
"
96 l. (44 %) 65 répl. 1,5 l. - 121 l. (56 %) 5...
\n",
"
4,9 pers.
\n",
"
Harpagon / Valère
\n",
"
7 sc.
\n",
"
217 l. (12 %)
\n",
"
\n",
"
\n",
"
4
\n",
"
9 l. (39 %) 7 répl. 1,2 l. - 13 l. (61 %) 5 ré...
\n",
"
4,9 pers.
\n",
"
Harpagon / Mariane
\n",
"
2 sc.
\n",
"
22 l. (1 %)
\n",
"
\n",
"
\n",
"
5
\n",
"
25 l. (71 %) 11 répl. 2,2 l. - 10 l. (29 %) 8 ...
\n",
"
5,3 pers.
\n",
"
Anselme / Harpagon
\n",
"
2 sc.
\n",
"
35 l. (2 %)
\n",
"
\n",
"
\n",
"
6
\n",
"
128 l. (70 %) 39 répl. 3,3 l. - 56 l. (30 %) 3...
\n",
"
3,0 pers.
\n",
"
Frosine / Harpagon
\n",
"
5 sc.
\n",
"
184 l. (10 %)
\n",
"
\n",
"
\n",
"
7
\n",
"
5 l. (22 %) 3 répl. 1,5 l. - 16 l. (78 %) 4 ré...
\n",
"
4,0 pers.
\n",
"
Harpagon / Maître Simon
\n",
"
1 sc.
\n",
"
20 l. (1 %)
\n",
"
\n",
"
\n",
"
8
\n",
"
58 l. (38 %) 49 répl. 1,2 l. - 94 l. (62 %) 51...
\n",
"
4,9 pers.
\n",
"
Harpagon / Maître Jacques
\n",
"
7 sc.
\n",
"
153 l. (9 %)
\n",
"
\n",
"
\n",
"
9
\n",
"
35 l. (62 %) 33 répl. 1,1 l. - 22 l. (38 %) 33...
\n",
"
2,0 pers.
\n",
"
Harpagon / La Flèche
\n",
"
1 sc.
\n",
"
57 l. (3 %)
\n",
"
\n",
"
\n",
"
10
\n",
"
1 l. (38 %) 2 répl. 0,7 l. - 2 l. (62 %) 2 rép...
\n",
"
6,9 pers.
\n",
"
Brindavoine / Harpagon
\n",
"
2 sc.
\n",
"
4 l. (0 %)
\n",
"
\n",
"
\n",
"
11
\n",
"
1 l. (11 %) 1 répl. 0,6 l. - 5 l. (89 %) 5 rép...
\n",
"
6,8 pers.
\n",
"
Harpagon / La Merluche
\n",
"
2 sc.
\n",
"
5 l. (0 %)
\n",
"
\n",
"
\n",
"
12
\n",
"
11 l. (53 %) 10 répl. 1,1 l. - 10 l. (47 %) 9 ...
\n",
"
3,8 pers.
\n",
"
Harpagon / Le Commissaire
\n",
"
3 sc.
\n",
"
21 l. (1 %)
\n",
"
\n",
"
\n",
"
13
\n",
"
66 l. (84 %) 10 répl. 6,6 l. - 13 l. (16 %) 9 ...
\n",
"
3,0 pers.
\n",
"
Cléante / Élise
\n",
"
2 sc.
\n",
"
78 l. (4 %)
\n",
"
\n",
"
\n",
"
14
\n",
"
31 l. (60 %) 12 répl. 2,6 l. - 21 l. (40 %) 10...
\n",
"
4,8 pers.
\n",
"
Cléante / Mariane
\n",
"
3 sc.
\n",
"
52 l. (3 %)
\n",
"
\n",
"
\n",
"
15
\n",
"
3 l. (8 %) 5 répl. 0,6 l. - 37 l. (92 %) 6 rép...
\n",
"
4,5 pers.
\n",
"
Cléante / Frosine
\n",
"
2 sc.
\n",
"
40 l. (2 %)
\n",
"
\n",
"
\n",
"
16
\n",
"
14 l. (51 %) 8 répl. 1,8 l. - 14 l. (49 %) 8 r...
\n",
"
3,0 pers.
\n",
"
Cléante / Maître Jacques
\n",
"
1 sc.
\n",
"
28 l. (2 %)
\n",
"
\n",
"
\n",
"
17
\n",
"
31 l. (27 %) 25 répl. 1,2 l. - 85 l. (73 %) 26...
\n",
"
2,5 pers.
\n",
"
Cléante / La Flèche
\n",
"
3 sc.
\n",
"
116 l. (7 %)
\n",
"
\n",
"
\n",
"
18
\n",
"
65 l. (59 %) 11 répl. 5,9 l. - 45 l. (41 %) 11...
\n",
"
2,5 pers.
\n",
"
Valère / Élise
\n",
"
2 sc.
\n",
"
110 l. (6 %)
\n",
"
\n",
"
\n",
"
19
\n",
"
1 l. (23 %) 2 répl. 0,6 l. - 4 l. (77 %) 1 rép...
\n",
"
4,0 pers.
\n",
"
Mariane / Élise
\n",
"
2 sc.
\n",
"
6 l. (0 %)
\n",
"
\n",
"
\n",
"
20
\n",
"
3 l. (41 %) 1 répl. 2,6 l. - 4 l. (59 %) 3 rép...
\n",
"
5,0 pers.
\n",
"
Mariane / Valère
\n",
"
1 sc.
\n",
"
6 l. (0 %)
\n",
"
\n",
"
\n",
"
21
\n",
"
19 l. (47 %) 8 répl. 2,4 l. - 22 l. (53 %) 7 r...
\n",
"
5,0 pers.
\n",
"
Anselme / Valère
\n",
"
1 sc.
\n",
"
41 l. (2 %)
\n",
"
\n",
"
\n",
"
22
\n",
"
18 l. (49 %) 14 répl. 1,3 l. - 19 l. (51 %) 18...
\n",
"
5,2 pers.
\n",
"
Maître Jacques / Valère
\n",
"
4 sc.
\n",
"
37 l. (2 %)
\n",
"
\n",
"
\n",
"
23
\n",
"
18 l. (48 %) 6 répl. 3,0 l. - 20 l. (52 %) 7 r...
\n",
"
4,1 pers.
\n",
"
Frosine / Mariane
\n",
"
4 sc.
\n",
"
38 l. (2 %)
\n",
"
\n",
"
\n",
"
24
\n",
"
1 l. (42 %) 1 répl. 1,0 l. - 1 l. (58 %) 2 rép...
\n",
"
4,7 pers.
\n",
"
Frosine / Maître Jacques
\n",
"
2 sc.
\n",
"
2 l. (0 %)
\n",
"
\n",
"
\n",
"
25
\n",
"
11 l. (42 %) 5 répl. 2,3 l. - 16 l. (58 %) 5 r...
\n",
"
2,0 pers.
\n",
"
Frosine / La Flèche
\n",
"
1 sc.
\n",
"
28 l. (2 %)
\n",
"
\n",
"
\n",
"
26
\n",
"
13 l. (76 %) 7 répl. 1,8 l. - 4 l. (24 %) 5 ré...
\n",
"
3,0 pers.
\n",
"
Le Commissaire / Maître Jacques
\n",
"
1 sc.
\n",
"
17 l. (1 %)
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Détail Interlocution \\\n",
"0 33 l. (100 %) 1 répl. 32,7 l. 1,0 pers. \n",
"1 135 l. (49 %) 97 répl. 1,4 l. - 140 l. (51 %) ... 4,6 pers. \n",
"2 41 l. (60 %) 28 répl. 1,5 l. - 28 l. (40 %) 28... 4,7 pers. \n",
"3 96 l. (44 %) 65 répl. 1,5 l. - 121 l. (56 %) 5... 4,9 pers. \n",
"4 9 l. (39 %) 7 répl. 1,2 l. - 13 l. (61 %) 5 ré... 4,9 pers. \n",
"5 25 l. (71 %) 11 répl. 2,2 l. - 10 l. (29 %) 8 ... 5,3 pers. \n",
"6 128 l. (70 %) 39 répl. 3,3 l. - 56 l. (30 %) 3... 3,0 pers. \n",
"7 5 l. (22 %) 3 répl. 1,5 l. - 16 l. (78 %) 4 ré... 4,0 pers. \n",
"8 58 l. (38 %) 49 répl. 1,2 l. - 94 l. (62 %) 51... 4,9 pers. \n",
"9 35 l. (62 %) 33 répl. 1,1 l. - 22 l. (38 %) 33... 2,0 pers. \n",
"10 1 l. (38 %) 2 répl. 0,7 l. - 2 l. (62 %) 2 rép... 6,9 pers. \n",
"11 1 l. (11 %) 1 répl. 0,6 l. - 5 l. (89 %) 5 rép... 6,8 pers. \n",
"12 11 l. (53 %) 10 répl. 1,1 l. - 10 l. (47 %) 9 ... 3,8 pers. \n",
"13 66 l. (84 %) 10 répl. 6,6 l. - 13 l. (16 %) 9 ... 3,0 pers. \n",
"14 31 l. (60 %) 12 répl. 2,6 l. - 21 l. (40 %) 10... 4,8 pers. \n",
"15 3 l. (8 %) 5 répl. 0,6 l. - 37 l. (92 %) 6 rép... 4,5 pers. \n",
"16 14 l. (51 %) 8 répl. 1,8 l. - 14 l. (49 %) 8 r... 3,0 pers. \n",
"17 31 l. (27 %) 25 répl. 1,2 l. - 85 l. (73 %) 26... 2,5 pers. \n",
"18 65 l. (59 %) 11 répl. 5,9 l. - 45 l. (41 %) 11... 2,5 pers. \n",
"19 1 l. (23 %) 2 répl. 0,6 l. - 4 l. (77 %) 1 rép... 4,0 pers. \n",
"20 3 l. (41 %) 1 répl. 2,6 l. - 4 l. (59 %) 3 rép... 5,0 pers. \n",
"21 19 l. (47 %) 8 répl. 2,4 l. - 22 l. (53 %) 7 r... 5,0 pers. \n",
"22 18 l. (49 %) 14 répl. 1,3 l. - 19 l. (51 %) 18... 5,2 pers. \n",
"23 18 l. (48 %) 6 répl. 3,0 l. - 20 l. (52 %) 7 r... 4,1 pers. \n",
"24 1 l. (42 %) 1 répl. 1,0 l. - 1 l. (58 %) 2 rép... 4,7 pers. \n",
"25 11 l. (42 %) 5 répl. 2,3 l. - 16 l. (58 %) 5 r... 2,0 pers. \n",
"26 13 l. (76 %) 7 répl. 1,8 l. - 4 l. (24 %) 5 ré... 3,0 pers. \n",
"\n",
" Relation Scènes Texte \n",
"0 Harpagon 1 sc. 33 l. (2 %) \n",
"1 Cléante / Harpagon 9 sc. 275 l. (16 %) \n",
"2 Harpagon / Élise 6 sc. 69 l. (4 %) \n",
"3 Harpagon / Valère 7 sc. 217 l. (12 %) \n",
"4 Harpagon / Mariane 2 sc. 22 l. (1 %) \n",
"5 Anselme / Harpagon 2 sc. 35 l. (2 %) \n",
"6 Frosine / Harpagon 5 sc. 184 l. (10 %) \n",
"7 Harpagon / Maître Simon 1 sc. 20 l. (1 %) \n",
"8 Harpagon / Maître Jacques 7 sc. 153 l. (9 %) \n",
"9 Harpagon / La Flèche 1 sc. 57 l. (3 %) \n",
"10 Brindavoine / Harpagon 2 sc. 4 l. (0 %) \n",
"11 Harpagon / La Merluche 2 sc. 5 l. (0 %) \n",
"12 Harpagon / Le Commissaire 3 sc. 21 l. (1 %) \n",
"13 Cléante / Élise 2 sc. 78 l. (4 %) \n",
"14 Cléante / Mariane 3 sc. 52 l. (3 %) \n",
"15 Cléante / Frosine 2 sc. 40 l. (2 %) \n",
"16 Cléante / Maître Jacques 1 sc. 28 l. (2 %) \n",
"17 Cléante / La Flèche 3 sc. 116 l. (7 %) \n",
"18 Valère / Élise 2 sc. 110 l. (6 %) \n",
"19 Mariane / Élise 2 sc. 6 l. (0 %) \n",
"20 Mariane / Valère 1 sc. 6 l. (0 %) \n",
"21 Anselme / Valère 1 sc. 41 l. (2 %) \n",
"22 Maître Jacques / Valère 4 sc. 37 l. (2 %) \n",
"23 Frosine / Mariane 4 sc. 38 l. (2 %) \n",
"24 Frosine / Maître Jacques 2 sc. 2 l. (0 %) \n",
"25 Frosine / La Flèche 1 sc. 28 l. (2 %) \n",
"26 Le Commissaire / Maître Jacques 1 sc. 17 l. (1 %) "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import defaultdict\n",
"\n",
"# Accumulateur : pour chaque relation (acteur seul - monologues) ou (acteurA, acteurB),\n",
"# on stocke les lignes, le nombre de répliques, les scènes concernées,\n",
"# les lignes de présence cumulées et un poids d’interlocution (lignes × nb de participants).\n",
"relation_stats = defaultdict(lambda: {\n",
" \"lines\": defaultdict(float),\n",
" \"counts\": defaultdict(int),\n",
" \"scenes\": set(),\n",
" \"presence_lines\": 0.0,\n",
" \"interlocution_weight\": 0.0,\n",
"})\n",
"\n",
"scene_lookup = scene_totals.set_index([\"Acte\", \"Scène\"])[[\"scene_lines\", \"participants\"]]\n",
"\n",
"for (act, scene), scene_df in df_speeches.groupby([\"Acte\", \"Scène\"]):\n",
" scene_df = scene_df.sort_values(\"Ordre\")\n",
" speakers = scene_df[\"Personnage\"].tolist()\n",
" lines = scene_df[\"Lignes\"].tolist()\n",
" scene_lines = scene_lookup.loc[(act, scene), \"scene_lines\"]\n",
" participants = scene_lookup.loc[(act, scene), \"participants\"]\n",
" scene_key = f\"{act} | {scene}\"\n",
"\n",
" # Scène à un seul intervenant : on enregistre un monologue\n",
" if len(set(speakers)) == 1:\n",
" actor = speakers[0]\n",
" stats = relation_stats[(actor,)]\n",
" stats[\"lines\"][actor] += scene_lines\n",
" stats[\"counts\"][actor] += len(speakers)\n",
" stats[\"scenes\"].add(scene_key)\n",
" stats[\"presence_lines\"] += scene_lines\n",
" stats[\"interlocution_weight\"] += scene_lines * participants\n",
" continue\n",
"\n",
" relations_here = set()\n",
" \n",
" # Pour chaque changement d’intervenant, on attribue les lignes du locuteur au duo (ordre ignoré)\n",
" for speaker, next_speaker, speaker_lines in zip(speakers, speakers[1:], lines):\n",
" if speaker == next_speaker:\n",
" continue\n",
" \n",
" key = tuple(sorted((speaker, next_speaker)))\n",
" stats = relation_stats[key]\n",
" stats[\"lines\"][speaker] += speaker_lines\n",
" stats[\"counts\"][speaker] += 1\n",
" stats[\"scenes\"].add(scene_key)\n",
" relations_here.add(key)\n",
"\n",
" # On ajoute la présence et l’interlocution une seule fois par scène et par relation\n",
" for key in relations_here:\n",
" stats = relation_stats[key]\n",
" stats[\"presence_lines\"] += scene_lines\n",
" stats[\"interlocution_weight\"] += scene_lines * participants\n",
"\n",
"role_order_index = {name: idx for idx, name in enumerate(role_order)}\n",
"\n",
"def relation_sort_key(rel):\n",
" if len(rel) == 1:\n",
" return (role_order_index.get(rel[0], len(role_order_index)), -1)\n",
" \n",
" a, b = rel\n",
" \n",
" return (\n",
" min(role_order_index.get(a, len(role_order_index)), role_order_index.get(b, len(role_order_index))),\n",
" max(role_order_index.get(a, len(role_order_index)), role_order_index.get(b, len(role_order_index))),\n",
" )\n",
"\n",
"relation_rows = []\n",
"\n",
"for rel in sorted(relation_stats, key=relation_sort_key):\n",
" data = relation_stats[rel]\n",
" total_lines = sum(data[\"lines\"].values())\n",
"\n",
" # On ignore les relations sans matière (moins de 2 lignes au total)\n",
" if total_lines < 2:\n",
" continue\n",
" \n",
" # On ignore les relations où au moins un protagoniste n’a jamais prononcé de réplique dans ce duo\n",
" if len(rel) > 1 and any(data[\"counts\"].get(actor, 0) == 0 for actor in rel):\n",
" continue\n",
"\n",
" scenes_count = len(data[\"scenes\"])\n",
" interlocution = data[\"interlocution_weight\"] / data[\"presence_lines\"] if data[\"presence_lines\"] else 0\n",
"\n",
" parts = []\n",
" \n",
" for actor in rel:\n",
" actor_lines = data[\"lines\"].get(actor, 0)\n",
" actor_repl = data[\"counts\"].get(actor, 0)\n",
" avg_lines = actor_lines / actor_repl if actor_repl else 0\n",
" share = actor_lines / total_lines if total_lines else 0\n",
" \n",
" parts.append(\n",
" f\"{format_lines(actor_lines)} ({format_percent(share)}) {actor_repl} répl. {format_lines(avg_lines, 1)}\"\n",
" )\n",
"\n",
" relation_rows.append({\n",
" \"Relation\": \" / \".join(rel),\n",
" \"Détail\": \" - \".join(parts),\n",
" \"Scènes\": f\"{scenes_count} sc.\",\n",
" \"Texte\": f\"{format_lines(total_lines)} ({format_percent(total_lines / total_lines_play)})\",\n",
" \"Interlocution\": format_people(interlocution),\n",
" })\n",
"\n",
"relations_table = pd.DataFrame(relation_rows)\n",
"relations_table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bien que le décompte des lignes diffère toujours de celui de l'OBVIL (comme attendu et pour les mêmes raisons que précédemment), l'interlocution est identique.\n",
"En effet, les écarts de comptage de lignes n’affectent pas l’interlocution ; seule une différence de liste d’intervenants par scène la ferait varier."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Voyons maintenant si nous parvenons à reproduire le graphique proposé par l'OBVIL :"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Ordonnancement des actes et des scènes\n",
"# On affichera les labels bruts (\"Acte Premier\") mais on utilisera un ordre \"naturel\"\n",
"scene_order = []\n",
"\n",
"for act in list_acts():\n",
" for sc in list_scenes(act=act[\"node\"]):\n",
" scene_order.append(f\"{act['title']} | {sc['title']}\")\n",
"\n",
"# Préparation des données\n",
"df = df_counts.copy()\n",
"\n",
"# Création d'une clé unique pour identifier un couple acte/scène\n",
"# Cela permet de forcer l'ordre d'affichage, au lieu de suivre un ordre alphabétique\n",
"# qui noierait \"Acte Premier\" au milieu de la liste, par exemple\n",
"df[\"SceneKey\"] = pd.Categorical(df[\"Acte\"] + \" | \" + df[\"Scène\"], categories=scene_order, ordered=True)\n",
"\n",
"# Obtention du pourcentage que représente un dialogue particulier au sein d'une scène\n",
"df[\"share\"] = df[\"Mots\"] / df.groupby(\"SceneKey\")[\"Mots\"].transform(\"sum\")\n",
"\n",
"# Calcul du total de mots pour une scène donnée\n",
"totals = df.groupby(\"SceneKey\")[\"Mots\"].sum()\n",
"\n",
"# Paramétrage du graphique\n",
"gap = 0 # Espace vertical ajouté entre deux scènes\n",
"label_fs = 8 # Taille de la police des étiquettes\n",
"min_target = 10 # Hauteur minimale souhaitée\n",
"\n",
"# Définition de la hauteur d'une scène.\n",
"# Nous utilisons ici une fonction racine carré.\n",
"# Le but de ce calcul est d'éviter que les scènes contenant le moins de mots\n",
"# se trouvent compressées en une ligne si fine qu'il ne serait pas possible\n",
"# de distinguer les différents protagonistes.\n",
"# On sacrifie donc le rapport proportionnel strict au profit d'une meilleure\n",
"# lisibilité.\n",
"def scene_height(total):\n",
" return max(min_target, math.sqrt(total) * factor)\n",
"\n",
"create_actors_colormap(df[\"Personnage\"].unique())\n",
"\n",
"# Calcul de l'échelle des scènes : évite qu'une scène courte soit\n",
"# représentée par une ligne trop fine pour être distinguée\n",
"min_total = totals.min()\n",
"factor = min_target / math.log1p(min_total)\n",
"\n",
"figure, axis = plt.subplots(figsize=(12, len(scene_order) * 0.5))\n",
"\n",
"# Affichage de la colormap des personnages\n",
"handles = [mpatches.Patch(color=col, label=name) for name, col in color_map.items()]\n",
"axis.legend(handles=handles, title=\"Personnage\", bbox_to_anchor=(1.25, 1), loc=\"upper left\")\n",
"\n",
"y = 0 # \"Curseur\" vertical permettant de positionner les scènes\n",
"\n",
"# Traçage des scènes\n",
"for scene in scene_order:\n",
" scene_rows = df[df[\"SceneKey\"] == scene]\n",
" h = scene_height(totals.loc[scene])\n",
"\n",
" left = 0\n",
"\n",
" for _, row in scene_rows.iterrows():\n",
" # broken_barth est la méthode nous permettant de tracer des barres\n",
" # horizontales juxtaposés\n",
" axis.broken_barh([(left, row[\"share\"])], (y, h),\n",
" facecolors=color_map[row[\"Personnage\"]],\n",
" edgecolors=\"white\", linewidth=0.5)\n",
"\n",
" # Décalage horizontal de la prochaine barre\n",
" left += row[\"share\"]\n",
"\n",
" # Étiquette correspondant à la scène\n",
" axis.text(1.01, y + h/2, scene, va=\"center\", fontsize=label_fs)\n",
"\n",
" # Décalage vertical de la prochaine scène\n",
" y += h + gap\n",
"\n",
"axis.set_xlim(0, 1)\n",
"axis.set_ylim(0, y)\n",
"axis.invert_yaxis() # Acte I en haut\n",
"axis.set_xlabel(\"% de la scène (largeur) - une ligne par scène\")\n",
"axis.set_yticks([]) # On masque l'ordonnée à gauche\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"L'ordre des dialogues est respecté sur ce graphique, contrairement au graphique de l'OBVIL.\n",
"Par exemple, dans la deuxième scène du premier acte, Cléante est bien la première à prendre la parole, et non Élise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Graphe réseau des dialogues"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"Ici, nous allons représenter les dialogues par un graphe réseau.\n",
"Nous allons essayer de reproduire le [graphe](https://obtic.huma-num.fr/obvil-web/corpus/moliere/moliere_avare) proposé par l'OBVIL."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/javascript": [
"(function(root) {\n",
" function embed_document(root) {\n",
" \n",
" var docs_json = {\"282226c6-fa94-4f45-bdc1-88b4dd40356b\":{\"roots\":{\"references\":[{\"attributes\":{\"graph_layout\":{\"0\":[0.09039769270365772,-0.4448908825326538],\"1\":[0.6127343525141392,-0.9127382605708516],\"10\":[-0.8154418483675555,0.03697230877760726],\"11\":[0.3698357659938343,-0.5146341944910402],\"12\":[0.5191326363353952,0.360042675508183],\"2\":[1.2078775491197695,0.3368250388089235],\"3\":[-0.4964368784029247,1.1125062544389128],\"4\":[-0.9014407977516858,-0.12737241902039884],\"5\":[0.04536051281049857,0.8505719904955572],\"6\":[0.0013736459302272884,-1.0],\"7\":[-0.6997414132054509,0.30096931672117927],\"8\":[0.3130005505544711,0.15392904397858956],\"9\":[-0.24665176823437693,-0.15218087211400824]}},\"id\":\"bad9993c-18e2-4665-9505-80c84db337b5\",\"type\":\"StaticLayoutProvider\"},{\"attributes\":{},\"id\":\"581b8c06-9728-4f00-b005-29f01518242d\",\"type\":\"LinearScale\"},{\"attributes\":{\"source\":{\"id\":\"ca8cc36d-9ace-4463-8f9a-533e4a60771e\",\"type\":\"ColumnDataSource\"}},\"id\":\"803d4f50-03f2-4372-9ad2-d4b41e472731\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"466582bb-0a46-4e8d-a274-669dc81ca6cf\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"edge_renderer\":{\"id\":\"ad6fb0fa-131e-4cda-9684-e999d27a8aad\",\"type\":\"GlyphRenderer\"},\"inspection_policy\":{\"id\":\"a9610f86-1073-46e6-8872-7d62424d3054\",\"type\":\"NodesOnly\"},\"layout_provider\":{\"id\":\"bad9993c-18e2-4665-9505-80c84db337b5\",\"type\":\"StaticLayoutProvider\"},\"node_renderer\":{\"id\":\"0146009c-04a0-4191-bf53-0b9fcb2d725d\",\"type\":\"GlyphRenderer\"},\"selection_policy\":{\"id\":\"75b6155f-befd-4a80-8218-167edc5b6df4\",\"type\":\"NodesOnly\"}},\"id\":\"c13af152-ed92-4db6-a8a0-558ee1e44671\",\"type\":\"GraphRenderer\"},{\"attributes\":{},\"id\":\"a7d3e6f7-ad6e-48d7-9a3c-c28dd4d56dad\",\"type\":\"LinearScale\"},{\"attributes\":{\"callback\":null,\"data\":{\"edge_color\":[\"#9467bd\",\"#9467bd\",\"#9467bd\",\"#e377c2\",\"#e377c2\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#ffbb78\",\"#ffbb78\",\"#ffbb78\",\"#ffbb78\",\"#ffbb78\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#98df8a\",\"#98df8a\",\"#98df8a\",\"#c49c94\",\"#c5b0d5\",\"#c5b0d5\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#8c564b\",\"#8c564b\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#d62728\",\"#d62728\",\"#d62728\",\"#d62728\",\"#d62728\"],\"end\":[4,7,11,4,6,3,4,5,8,9,10,12,2,4,5,8,9,0,1,2,3,5,6,7,8,9,10,11,12,2,3,4,4,4,9,0,2,3,4,11,12,0,2,3,4,6,7,11,4,5,0,2,4,6,8,9,12,2,3,4,8,11],\"label\":[\"Anselme \\u2192 Harpagon : 293 mots\",\"Anselme \\u2192 Le Commissaire : 13 mots\",\"Anselme \\u2192 Val\\u00e8re : 211 mots\",\"Brindavoine \\u2192 Harpagon : 20 mots\",\"Brindavoine \\u2192 La Merluche : 23 mots\",\"Cl\\u00e9ante \\u2192 Frosine : 32 mots\",\"Cl\\u00e9ante \\u2192 Harpagon : 1587 mots\",\"Cl\\u00e9ante \\u2192 La Fl\\u00e8che : 385 mots\",\"Cl\\u00e9ante \\u2192 Mariane : 350 mots\",\"Cl\\u00e9ante \\u2192 Ma\\u00eetre Jacques : 170 mots\",\"Cl\\u00e9ante \\u2192 Ma\\u00eetre Simon : 13 mots\",\"Cl\\u00e9ante \\u2192 \\u00c9lise : 758 mots\",\"Frosine \\u2192 Cl\\u00e9ante : 436 mots\",\"Frosine \\u2192 Harpagon : 1498 mots\",\"Frosine \\u2192 La Fl\\u00e8che : 130 mots\",\"Frosine \\u2192 Mariane : 203 mots\",\"Frosine \\u2192 Ma\\u00eetre Jacques : 10 mots\",\"Harpagon \\u2192 Anselme : 120 mots\",\"Harpagon \\u2192 Brindavoine : 27 mots\",\"Harpagon \\u2192 Cl\\u00e9ante : 1661 mots\",\"Harpagon \\u2192 Frosine : 678 mots\",\"Harpagon \\u2192 La Fl\\u00e8che : 434 mots\",\"Harpagon \\u2192 La Merluche : 6 mots\",\"Harpagon \\u2192 Le Commissaire : 135 mots\",\"Harpagon \\u2192 Mariane : 94 mots\",\"Harpagon \\u2192 Ma\\u00eetre Jacques : 677 mots\",\"Harpagon \\u2192 Ma\\u00eetre Simon : 51 mots\",\"Harpagon \\u2192 Val\\u00e8re : 1129 mots\",\"Harpagon \\u2192 \\u00c9lise : 506 mots\",\"La Fl\\u00e8che \\u2192 Cl\\u00e9ante : 953 mots\",\"La Fl\\u00e8che \\u2192 Frosine : 196 mots\",\"La Fl\\u00e8che \\u2192 Harpagon : 258 mots\",\"La Merluche \\u2192 Harpagon : 47 mots\",\"Le Commissaire \\u2192 Harpagon : 113 mots\",\"Le Commissaire \\u2192 Ma\\u00eetre Jacques : 148 mots\",\"Mariane \\u2192 Anselme : 199 mots\",\"Mariane \\u2192 Cl\\u00e9ante : 245 mots\",\"Mariane \\u2192 Frosine : 231 mots\",\"Mariane \\u2192 Harpagon : 163 mots\",\"Mariane \\u2192 Val\\u00e8re : 29 mots\",\"Mariane \\u2192 \\u00c9lise : 14 mots\",\"Ma\\u00eetre Jacques \\u2192 Anselme : 23 mots\",\"Ma\\u00eetre Jacques \\u2192 Cl\\u00e9ante : 154 mots\",\"Ma\\u00eetre Jacques \\u2192 Frosine : 18 mots\",\"Ma\\u00eetre Jacques \\u2192 Harpagon : 1132 mots\",\"Ma\\u00eetre Jacques \\u2192 La Merluche : 8 mots\",\"Ma\\u00eetre Jacques \\u2192 Le Commissaire : 48 mots\",\"Ma\\u00eetre Jacques \\u2192 Val\\u00e8re : 218 mots\",\"Ma\\u00eetre Simon \\u2192 Harpagon : 194 mots\",\"Ma\\u00eetre Simon \\u2192 La Fl\\u00e8che : 3 mots\",\"Val\\u00e8re \\u2192 Anselme : 262 mots\",\"Val\\u00e8re \\u2192 Cl\\u00e9ante : 5 mots\",\"Val\\u00e8re \\u2192 Harpagon : 1432 mots\",\"Val\\u00e8re \\u2192 La Merluche : 4 mots\",\"Val\\u00e8re \\u2192 Mariane : 43 mots\",\"Val\\u00e8re \\u2192 Ma\\u00eetre Jacques : 218 mots\",\"Val\\u00e8re \\u2192 \\u00c9lise : 742 mots\",\"\\u00c9lise \\u2192 Cl\\u00e9ante : 154 mots\",\"\\u00c9lise \\u2192 Frosine : 10 mots\",\"\\u00c9lise \\u2192 Harpagon : 328 mots\",\"\\u00c9lise \\u2192 Mariane : 48 mots\",\"\\u00c9lise \\u2192 Val\\u00e8re : 514 mots\"],\"line_width\":[8.839833447124025,3.2855016814836904,8.243276719220827,4.025219576782893,4.268830084156646,4.849807137477693,11.916906689721008,9.336531723736865,9.16312337796775,7.851178416414938,3.2855016814836904,10.570104977535497,9.56292827701689,11.811682307906707,7.365045732431889,8.173100036381978,2.845533147749768,7.220178484356978,4.5500577759124115,12.0,10.366905594341086,9.554559598093043,2.0209455870549684,7.433382141082776,6.778838760174172,10.364216767315883,5.679413597641419,11.296151100803042,9.83398899615332,10.987268604247955,8.109399891487643,8.608591589203272,5.533386178585367,7.111460521115734,7.599931360400959,8.13697274085889,8.514642933957813,8.407745568436198,7.774925038658612,4.675926218514289,3.411370124085567,4.268830084156646,7.67195527165365,3.8426304369590873,11.300988139800165,2.4794357905984072,5.571003362967381,8.302542170246781,8.090783721726984,1.0,8.636551811670916,1.7397178952992034,11.729534501087539,1.4070961343576416,5.374645336607211,8.302542170246781,10.531235463441382,7.67195527165365,2.845533147749768,9.045034771489759,5.571003362967381,9.86255112640804],\"log_weight\":[5.683579767338681,2.639057329615259,5.356586274672012,3.044522437723423,3.1780538303479458,3.4965075614664802,7.370230641807081,5.955837369464831,5.860786223465865,5.14166355650266,2.639057329615259,6.6320017773956295,6.07993319509559,7.312553498102598,4.875197323201151,5.318119993844216,2.3978952727983707,4.795790545596741,3.332204510175204,7.415776975415394,6.520621127558696,6.075346031088684,1.9459101490553132,4.912654885736052,4.553876891600541,6.519147287940395,3.9512437185814275,7.029972911706386,6.2285110035911835,6.860663671448287,5.2832037287379885,5.556828061699537,3.8712010109078907,4.736198448394496,5.003946305945459,5.298317366548036,5.5053315359323625,5.44673737166631,5.099866427824199,3.4011973816621555,2.70805020110221,3.1780538303479458,5.043425116919247,2.9444389791664403,7.0326242610280065,2.1972245773362196,3.8918202981106265,5.389071729816501,5.272999558563747,1.3862943611198906,5.572154032177765,1.791759469228055,7.267525427828172,1.6094379124341003,3.784189633918261,5.389071729816501,6.610696044717759,5.043425116919247,2.3978952727983707,5.796057750765372,3.8918202981106265,6.244166900663736],\"start\":[0,0,0,1,1,2,2,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,6,7,7,8,8,8,8,8,8,9,9,9,9,9,9,9,10,10,11,11,11,11,11,11,11,12,12,12,12,12],\"weight\":[293,13,211,20,23,32,1587,385,350,170,13,758,436,1498,130,203,10,120,27,1661,678,434,6,135,94,677,51,1129,506,953,196,258,47,113,148,199,245,231,163,29,14,23,154,18,1132,8,48,218,194,3,262,5,1432,4,43,218,742,154,10,328,48,514]},\"selected\":{\"id\":\"98e15459-3fa3-4654-91d7-ce085cacf891\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"95abafef-3d55-49a0-bbb2-f456331c7107\",\"type\":\"UnionRenderers\"}},\"id\":\"1ba0c624-24f2-4219-987c-14d36a92ae33\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"98e15459-3fa3-4654-91d7-ce085cacf891\",\"type\":\"Selection\"},{\"attributes\":{\"formatter\":{\"id\":\"596840ad-b8b2-4fb6-8b66-455557a9e79b\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"aa2593b4-c50c-4fc5-a63d-48af3f67fd16\",\"type\":\"BasicTicker\"}},\"id\":\"cc184c2b-f4fd-4577-9f3d-c58c2cbd76a3\",\"type\":\"LinearAxis\"},{\"attributes\":{\"callback\":null,\"data\":{\"color\":[\"#9467bd\",\"#9467bd\",\"#9467bd\",\"#e377c2\",\"#e377c2\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#aec7e8\",\"#ffbb78\",\"#ffbb78\",\"#ffbb78\",\"#ffbb78\",\"#ffbb78\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#1f77b4\",\"#98df8a\",\"#98df8a\",\"#98df8a\",\"#c49c94\",\"#c5b0d5\",\"#c5b0d5\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#ff9896\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#2ca02c\",\"#8c564b\",\"#8c564b\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#ff7f0e\",\"#d62728\",\"#d62728\",\"#d62728\",\"#d62728\",\"#d62728\"],\"line_width\":[8.839833447124025,3.2855016814836904,8.243276719220827,4.025219576782893,4.268830084156646,4.849807137477693,11.916906689721008,9.336531723736865,9.16312337796775,7.851178416414938,3.2855016814836904,10.570104977535497,9.56292827701689,11.811682307906707,7.365045732431889,8.173100036381978,2.845533147749768,7.220178484356978,4.5500577759124115,12.0,10.366905594341086,9.554559598093043,2.0209455870549684,7.433382141082776,6.778838760174172,10.364216767315883,5.679413597641419,11.296151100803042,9.83398899615332,10.987268604247955,8.109399891487643,8.608591589203272,5.533386178585367,7.111460521115734,7.599931360400959,8.13697274085889,8.514642933957813,8.407745568436198,7.774925038658612,4.675926218514289,3.411370124085567,4.268830084156646,7.67195527165365,3.8426304369590873,11.300988139800165,2.4794357905984072,5.571003362967381,8.302542170246781,8.090783721726984,1.0,8.636551811670916,1.7397178952992034,11.729534501087539,1.4070961343576416,5.374645336607211,8.302542170246781,10.531235463441382,7.67195527165365,2.845533147749768,9.045034771489759,5.571003362967381,9.86255112640804],\"x_end\":[-0.9014407977516858,-0.6997414132054509,0.3698357659938343,-0.9014407977516858,0.0013736459302272884,-0.4964368784029247,-0.9014407977516858,0.04536051281049857,0.3130005505544711,-0.24665176823437693,-0.8154418483675555,0.5191326363353952,1.2078775491197695,-0.9014407977516858,0.04536051281049857,0.3130005505544711,-0.24665176823437693,0.09039769270365772,0.6127343525141392,1.2078775491197695,-0.4964368784029247,0.04536051281049857,0.0013736459302272884,-0.6997414132054509,0.3130005505544711,-0.24665176823437693,-0.8154418483675555,0.3698357659938343,0.5191326363353952,1.2078775491197695,-0.4964368784029247,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.24665176823437693,0.09039769270365772,1.2078775491197695,-0.4964368784029247,-0.9014407977516858,0.3698357659938343,0.5191326363353952,0.09039769270365772,1.2078775491197695,-0.4964368784029247,-0.9014407977516858,0.0013736459302272884,-0.6997414132054509,0.3698357659938343,-0.9014407977516858,0.04536051281049857,0.09039769270365772,1.2078775491197695,-0.9014407977516858,0.0013736459302272884,0.3130005505544711,-0.24665176823437693,0.5191326363353952,1.2078775491197695,-0.4964368784029247,-0.9014407977516858,0.3130005505544711,0.3698357659938343],\"x_start\":[0.09039769270365772,0.09039769270365772,0.09039769270365772,0.6127343525141392,0.6127343525141392,1.2078775491197695,1.2078775491197695,1.2078775491197695,1.2078775491197695,1.2078775491197695,1.2078775491197695,1.2078775491197695,-0.4964368784029247,-0.4964368784029247,-0.4964368784029247,-0.4964368784029247,-0.4964368784029247,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,-0.9014407977516858,0.04536051281049857,0.04536051281049857,0.04536051281049857,0.0013736459302272884,-0.6997414132054509,-0.6997414132054509,0.3130005505544711,0.3130005505544711,0.3130005505544711,0.3130005505544711,0.3130005505544711,0.3130005505544711,-0.24665176823437693,-0.24665176823437693,-0.24665176823437693,-0.24665176823437693,-0.24665176823437693,-0.24665176823437693,-0.24665176823437693,-0.8154418483675555,-0.8154418483675555,0.3698357659938343,0.3698357659938343,0.3698357659938343,0.3698357659938343,0.3698357659938343,0.3698357659938343,0.3698357659938343,0.5191326363353952,0.5191326363353952,0.5191326363353952,0.5191326363353952,0.5191326363353952],\"y_end\":[-0.12737241902039884,0.30096931672117927,-0.5146341944910402,-0.12737241902039884,-1.0,1.1125062544389128,-0.12737241902039884,0.8505719904955572,0.15392904397858956,-0.15218087211400824,0.03697230877760726,0.360042675508183,0.3368250388089235,-0.12737241902039884,0.8505719904955572,0.15392904397858956,-0.15218087211400824,-0.4448908825326538,-0.9127382605708516,0.3368250388089235,1.1125062544389128,0.8505719904955572,-1.0,0.30096931672117927,0.15392904397858956,-0.15218087211400824,0.03697230877760726,-0.5146341944910402,0.360042675508183,0.3368250388089235,1.1125062544389128,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.15218087211400824,-0.4448908825326538,0.3368250388089235,1.1125062544389128,-0.12737241902039884,-0.5146341944910402,0.360042675508183,-0.4448908825326538,0.3368250388089235,1.1125062544389128,-0.12737241902039884,-1.0,0.30096931672117927,-0.5146341944910402,-0.12737241902039884,0.8505719904955572,-0.4448908825326538,0.3368250388089235,-0.12737241902039884,-1.0,0.15392904397858956,-0.15218087211400824,0.360042675508183,0.3368250388089235,1.1125062544389128,-0.12737241902039884,0.15392904397858956,-0.5146341944910402],\"y_start\":[-0.4448908825326538,-0.4448908825326538,-0.4448908825326538,-0.9127382605708516,-0.9127382605708516,0.3368250388089235,0.3368250388089235,0.3368250388089235,0.3368250388089235,0.3368250388089235,0.3368250388089235,0.3368250388089235,1.1125062544389128,1.1125062544389128,1.1125062544389128,1.1125062544389128,1.1125062544389128,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,-0.12737241902039884,0.8505719904955572,0.8505719904955572,0.8505719904955572,-1.0,0.30096931672117927,0.30096931672117927,0.15392904397858956,0.15392904397858956,0.15392904397858956,0.15392904397858956,0.15392904397858956,0.15392904397858956,-0.15218087211400824,-0.15218087211400824,-0.15218087211400824,-0.15218087211400824,-0.15218087211400824,-0.15218087211400824,-0.15218087211400824,0.03697230877760726,0.03697230877760726,-0.5146341944910402,-0.5146341944910402,-0.5146341944910402,-0.5146341944910402,-0.5146341944910402,-0.5146341944910402,-0.5146341944910402,0.360042675508183,0.360042675508183,0.360042675508183,0.360042675508183,0.360042675508183]},\"selected\":{\"id\":\"2976d6d3-4cca-4781-98fc-40e4c2884ac6\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"6d44f179-4dfa-4b23-8e8d-4989ac87fc9b\",\"type\":\"UnionRenderers\"}},\"id\":\"0332bc33-ff32-412c-a9c6-78573ae0b8d1\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"aa2593b4-c50c-4fc5-a63d-48af3f67fd16\",\"type\":\"BasicTicker\"},{\"attributes\":{\"plot\":{\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"aa2593b4-c50c-4fc5-a63d-48af3f67fd16\",\"type\":\"BasicTicker\"}},\"id\":\"23e9f818-270b-4a15-afdc-57236750a4b6\",\"type\":\"Grid\"},{\"attributes\":{\"formatter\":{\"id\":\"7d55de9c-30ac-439d-a8cc-0da20b284d9d\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"db15801e-b828-4b37-ad24-f3ac76b6acfd\",\"type\":\"BasicTicker\"}},\"id\":\"84766bdd-b282-4acc-8b13-3140106460e7\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"db15801e-b828-4b37-ad24-f3ac76b6acfd\",\"type\":\"BasicTicker\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"db15801e-b828-4b37-ad24-f3ac76b6acfd\",\"type\":\"BasicTicker\"}},\"id\":\"41228c1d-e034-402f-9563-0d06b07b2e8c\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1ac11a4c-3593-4302-84d5-beb6ce43d7fe\",\"type\":\"PanTool\"},{\"attributes\":{\"callback\":null,\"data\":{\"label\":[\"Anselme\",\"Brindavoine\",\"Cl\\u00e9ante\",\"Frosine\",\"Harpagon\",\"La Fl\\u00e8che\",\"La Merluche\",\"Le Commissaire\",\"Mariane\",\"Ma\\u00eetre Jacques\",\"Ma\\u00eetre Simon\",\"Val\\u00e8re\",\"\\u00c9lise\"],\"x\":[0.09039769270365772,0.6127343525141392,1.2078775491197695,-0.4964368784029247,-0.9014407977516858,0.04536051281049857,0.0013736459302272884,-0.6997414132054509,0.3130005505544711,-0.24665176823437693,-0.8154418483675555,0.3698357659938343,0.5191326363353952],\"y\":[-0.4448908825326538,-0.9127382605708516,0.3368250388089235,1.1125062544389128,-0.12737241902039884,0.8505719904955572,-1.0,0.30096931672117927,0.15392904397858956,-0.15218087211400824,0.03697230877760726,-0.5146341944910402,0.360042675508183]},\"selected\":{\"id\":\"3374a395-d5db-48fd-aa5e-e4f8bdef6c96\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"5f485a28-f830-45f4-87c6-635763853a6f\",\"type\":\"UnionRenderers\"}},\"id\":\"58f4d6ca-8561-4b95-818c-a0eb593c6de8\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"fill_color\":{\"field\":\"color\"},\"line_color\":{\"field\":\"color\"},\"plot\":null,\"size\":7},\"id\":\"d7a8adfd-b87e-4e6a-baa7-58761288a6e6\",\"type\":\"NormalHead\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"96f72c3b-ceda-4733-a840-80c86868969a\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"plot\":{\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"source\":{\"id\":\"58f4d6ca-8561-4b95-818c-a0eb593c6de8\",\"type\":\"ColumnDataSource\"},\"text\":{\"field\":\"label\"},\"text_align\":\"center\",\"text_baseline\":\"middle\",\"text_color\":{\"value\":\"#111111\"},\"text_font_size\":{\"value\":\"9pt\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"7eeef452-cb0b-4215-a3f4-23bec1d88d58\",\"type\":\"LabelSet\"},{\"attributes\":{},\"id\":\"22b19a18-c9c2-4770-b5f3-c9481b22978a\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"overlay\":{\"id\":\"96f72c3b-ceda-4733-a840-80c86868969a\",\"type\":\"BoxAnnotation\"}},\"id\":\"218d05ab-f2a8-4f38-abf6-d26c4ba62a97\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"2c0dd2d5-63e3-4d8a-a925-9e71d212f780\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"6f903ba1-42e7-454d-82bc-98a69f4fcd29\",\"type\":\"SaveTool\"},{\"attributes\":{\"callback\":null,\"renderers\":[{\"id\":\"ad6fb0fa-131e-4cda-9684-e999d27a8aad\",\"type\":\"GlyphRenderer\"}],\"tooltips\":[[\"Interaction\",\"@label\"],[\"Mots\",\"@weight\"]]},\"id\":\"b9aefbb0-556c-4091-ad9d-78e12a62a68b\",\"type\":\"HoverTool\"},{\"attributes\":{\"fill_color\":{\"field\":\"color\"},\"line_width\":{\"value\":1.2},\"radius\":{\"field\":\"radius\",\"units\":\"data\"}},\"id\":\"7abbacdd-83bf-44ff-8fb6-5d921b65753b\",\"type\":\"Circle\"},{\"attributes\":{\"end\":{\"id\":\"d7a8adfd-b87e-4e6a-baa7-58761288a6e6\",\"type\":\"NormalHead\"},\"line_alpha\":{\"value\":0.5},\"line_color\":{\"field\":\"color\"},\"line_width\":{\"field\":\"line_width\"},\"plot\":{\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"source\":{\"id\":\"0332bc33-ff32-412c-a9c6-78573ae0b8d1\",\"type\":\"ColumnDataSource\"},\"start\":null,\"x_end\":{\"field\":\"x_end\"},\"x_start\":{\"field\":\"x_start\"},\"y_end\":{\"field\":\"y_end\"},\"y_start\":{\"field\":\"y_start\"}},\"id\":\"7b07818d-7e7d-4c3b-adf0-43b9f59f185a\",\"type\":\"Arrow\"},{\"attributes\":{\"data_source\":{\"id\":\"ca8cc36d-9ace-4463-8f9a-533e4a60771e\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"7abbacdd-83bf-44ff-8fb6-5d921b65753b\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"view\":{\"id\":\"803d4f50-03f2-4372-9ad2-d4b41e472731\",\"type\":\"CDSView\"}},\"id\":\"0146009c-04a0-4191-bf53-0b9fcb2d725d\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"data_source\":{\"id\":\"1ba0c624-24f2-4219-987c-14d36a92ae33\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"9951bbb8-f137-4086-a0fe-5bda4ed87533\",\"type\":\"MultiLine\"},\"hover_glyph\":null,\"muted_glyph\":null,\"view\":{\"id\":\"a278fae1-704c-44a5-b6bc-e7f2a250359e\",\"type\":\"CDSView\"}},\"id\":\"ad6fb0fa-131e-4cda-9684-e999d27a8aad\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"a9610f86-1073-46e6-8872-7d62424d3054\",\"type\":\"NodesOnly\"},{\"attributes\":{},\"id\":\"7d55de9c-30ac-439d-a8cc-0da20b284d9d\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"plot\":null,\"text\":\"Graphe des interlocutions\"},\"id\":\"74abe625-8f39-4c51-97e9-aacaf97984fe\",\"type\":\"Title\"},{\"attributes\":{\"callback\":null,\"renderers\":[{\"id\":\"0146009c-04a0-4191-bf53-0b9fcb2d725d\",\"type\":\"GlyphRenderer\"}],\"tooltips\":[[\"Personnage\",\"@label\"],[\"Mots totaux\",\"@mots\"]]},\"id\":\"7058bbe4-fff0-4bed-a29f-a34e3c1747be\",\"type\":\"HoverTool\"},{\"attributes\":{},\"id\":\"5f485a28-f830-45f4-87c6-635763853a6f\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"callback\":null,\"end\":1.8416294254514434,\"start\":-1.5351926740833597},\"id\":\"638d24e8-5a5f-4048-bc8d-30be885097fc\",\"type\":\"Range1d\"},{\"attributes\":{},\"id\":\"596840ad-b8b2-4fb6-8b66-455557a9e79b\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"6d44f179-4dfa-4b23-8e8d-4989ac87fc9b\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"3374a395-d5db-48fd-aa5e-e4f8bdef6c96\",\"type\":\"Selection\"},{\"attributes\":{\"source\":{\"id\":\"1ba0c624-24f2-4219-987c-14d36a92ae33\",\"type\":\"ColumnDataSource\"}},\"id\":\"a278fae1-704c-44a5-b6bc-e7f2a250359e\",\"type\":\"CDSView\"},{\"attributes\":{\"callback\":null,\"data\":{\"color\":[\"#9467bd\",\"#e377c2\",\"#aec7e8\",\"#ffbb78\",\"#1f77b4\",\"#98df8a\",\"#c49c94\",\"#c5b0d5\",\"#ff9896\",\"#2ca02c\",\"#8c564b\",\"#ff7f0e\",\"#d62728\"],\"index\":[0,1,2,3,4,5,6,7,8,9,10,11,12],\"label\":[\"Anselme\",\"Brindavoine\",\"Cl\\u00e9ante\",\"Frosine\",\"Harpagon\",\"La Fl\\u00e8che\",\"La Merluche\",\"Le Commissaire\",\"Mariane\",\"Ma\\u00eetre Jacques\",\"Ma\\u00eetre Simon\",\"Val\\u00e8re\",\"\\u00c9lise\"],\"mots\":[517,43,3341,2339,6132,1512,47,294,919,1672,197,2723,1067],\"name\":[\"Anselme\",\"Brindavoine\",\"Cl\\u00e9ante\",\"Frosine\",\"Harpagon\",\"La Fl\\u00e8che\",\"La Merluche\",\"Le Commissaire\",\"Mariane\",\"Ma\\u00eetre Jacques\",\"Ma\\u00eetre Simon\",\"Val\\u00e8re\",\"\\u00c9lise\"],\"radius\":[0.07723369913188519,0.05604628605314156,0.12314848166606167,0.11078951413773001,0.15,0.098377395315688,0.05643678981434877,0.06991220619276833,0.08715601771591241,0.10100372861083977,0.0658387731203341,0.11579057803235171,0.09023320107744473]},\"selected\":{\"id\":\"4c03e756-6fde-4460-8c78-06bbb7173103\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"466582bb-0a46-4e8d-a274-669dc81ca6cf\",\"type\":\"UnionRenderers\"}},\"id\":\"ca8cc36d-9ace-4463-8f9a-533e4a60771e\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"75b6155f-befd-4a80-8218-167edc5b6df4\",\"type\":\"NodesOnly\"},{\"attributes\":{\"line_alpha\":{\"value\":0.25},\"line_color\":{\"field\":\"edge_color\"},\"line_width\":{\"field\":\"line_width\"}},\"id\":\"9951bbb8-f137-4086-a0fe-5bda4ed87533\",\"type\":\"MultiLine\"},{\"attributes\":{},\"id\":\"95abafef-3d55-49a0-bbb2-f456331c7107\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"callback\":null,\"end\":1.7462581307705867,\"start\":-1.6337518763316738},\"id\":\"b971fd6e-237d-4e2d-a6b5-766befe74a40\",\"type\":\"Range1d\"},{\"attributes\":{\"background_fill_color\":{\"value\":\"#f8f8f8\"},\"below\":[{\"id\":\"cc184c2b-f4fd-4577-9f3d-c58c2cbd76a3\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"84766bdd-b282-4acc-8b13-3140106460e7\",\"type\":\"LinearAxis\"}],\"plot_height\":700,\"plot_width\":900,\"renderers\":[{\"id\":\"cc184c2b-f4fd-4577-9f3d-c58c2cbd76a3\",\"type\":\"LinearAxis\"},{\"id\":\"23e9f818-270b-4a15-afdc-57236750a4b6\",\"type\":\"Grid\"},{\"id\":\"84766bdd-b282-4acc-8b13-3140106460e7\",\"type\":\"LinearAxis\"},{\"id\":\"41228c1d-e034-402f-9563-0d06b07b2e8c\",\"type\":\"Grid\"},{\"id\":\"96f72c3b-ceda-4733-a840-80c86868969a\",\"type\":\"BoxAnnotation\"},{\"id\":\"c13af152-ed92-4db6-a8a0-558ee1e44671\",\"type\":\"GraphRenderer\"},{\"id\":\"7b07818d-7e7d-4c3b-adf0-43b9f59f185a\",\"type\":\"Arrow\"},{\"id\":\"7eeef452-cb0b-4215-a3f4-23bec1d88d58\",\"type\":\"LabelSet\"}],\"title\":{\"id\":\"74abe625-8f39-4c51-97e9-aacaf97984fe\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"9689ed3f-d51b-4537-93e8-669d0217b656\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"638d24e8-5a5f-4048-bc8d-30be885097fc\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"581b8c06-9728-4f00-b005-29f01518242d\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"b971fd6e-237d-4e2d-a6b5-766befe74a40\",\"type\":\"Range1d\"},\"y_scale\":{\"id\":\"a7d3e6f7-ad6e-48d7-9a3c-c28dd4d56dad\",\"type\":\"LinearScale\"}},\"id\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"2976d6d3-4cca-4781-98fc-40e4c2884ac6\",\"type\":\"Selection\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_scroll\":{\"id\":\"22b19a18-c9c2-4770-b5f3-c9481b22978a\",\"type\":\"WheelZoomTool\"},\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1ac11a4c-3593-4302-84d5-beb6ce43d7fe\",\"type\":\"PanTool\"},{\"id\":\"22b19a18-c9c2-4770-b5f3-c9481b22978a\",\"type\":\"WheelZoomTool\"},{\"id\":\"218d05ab-f2a8-4f38-abf6-d26c4ba62a97\",\"type\":\"BoxZoomTool\"},{\"id\":\"2c0dd2d5-63e3-4d8a-a925-9e71d212f780\",\"type\":\"ResetTool\"},{\"id\":\"6f903ba1-42e7-454d-82bc-98a69f4fcd29\",\"type\":\"SaveTool\"},{\"id\":\"7058bbe4-fff0-4bed-a29f-a34e3c1747be\",\"type\":\"HoverTool\"},{\"id\":\"b9aefbb0-556c-4091-ad9d-78e12a62a68b\",\"type\":\"HoverTool\"}]},\"id\":\"9689ed3f-d51b-4537-93e8-669d0217b656\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"4c03e756-6fde-4460-8c78-06bbb7173103\",\"type\":\"Selection\"}],\"root_ids\":[\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\"]},\"title\":\"Bokeh Application\",\"version\":\"0.12.16\"}};\n",
" var render_items = [{\"docid\":\"282226c6-fa94-4f45-bdc1-88b4dd40356b\",\"elementid\":\"1a4a8363-4dba-4425-aae5-ec0edc3b7c6e\",\"modelid\":\"7e8e7531-a329-4a09-9774-cf0d045a1cbc\"}];\n",
" root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n",
"\n",
" }\n",
" if (root.Bokeh !== undefined) {\n",
" embed_document(root);\n",
" } else {\n",
" var attempts = 0;\n",
" var timer = setInterval(function(root) {\n",
" if (root.Bokeh !== undefined) {\n",
" embed_document(root);\n",
" clearInterval(timer);\n",
" }\n",
" attempts++;\n",
" if (attempts > 100) {\n",
" console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n",
" clearInterval(timer);\n",
" }\n",
" }, 10, root)\n",
" }\n",
"})(window);"
],
"application/vnd.bokehjs_exec.v0+json": ""
},
"metadata": {
"application/vnd.bokehjs_exec.v0+json": {
"id": "7e8e7531-a329-4a09-9774-cf0d045a1cbc"
}
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# 0. Fonction pour construire les arêtes d'interlocution\n",
"def interlocution_edges(df_speeches):\n",
" \"\"\"Arêtes directionnelles source -> cible : succession des locuteurs par scène.\"\"\"\n",
" df = df_speeches.copy()\n",
" df[\"SceneKey\"] = df[\"Acte\"] + \" | \" + df[\"Scène\"]\n",
" df[\"order\"] = range(len(df))\n",
" df = df.sort_values([\"SceneKey\", \"order\"])\n",
" df[\"next\"] = df.groupby(\"SceneKey\")[\"Personnage\"].shift(-1)\n",
"\n",
" return (df.dropna(subset=[\"next\"])\n",
" .groupby([\"Personnage\", \"next\"], as_index=False)[\"Mots\"]\n",
" .sum()\n",
" .rename(columns={\"Personnage\": \"source\", \"next\": \"target\", \"Mots\": \"weight\"}))\n",
"\n",
"# 1. Données de base\n",
"totaux = df_speeches.groupby(\"Personnage\")[\"Mots\"].sum()\n",
"edges_dir = interlocution_edges(df_speeches)\n",
"\n",
"# 2. Construction du graphe NetworkX\n",
"G = nx.DiGraph()\n",
"\n",
"# Nœuds : taille ~ mots, couleur issue de color_map\n",
"for personnage, mots in totaux.items():\n",
" taille = 8 + 0.6 * np.sqrt(mots) # on ne calcule plus cette formule qu'ici\n",
" G.add_node(\n",
" personnage,\n",
" mots=int(mots),\n",
" label=personnage,\n",
" color=mcolors.to_hex(color_map.get(personnage)) if color_map.get(personnage) else \"#cccccc\",\n",
" size=taille,\n",
" )\n",
"\n",
"# Arêtes dirigées : poids log1p, couleur héritée de la source\n",
"for _, r in edges_dir.iterrows():\n",
" if r[\"source\"] == r[\"target\"]:\n",
" continue\n",
"\n",
" src = r[\"source\"]\n",
" tgt = r[\"target\"]\n",
" w = int(r[\"weight\"])\n",
"\n",
" G.add_edge(\n",
" src,\n",
" tgt,\n",
" weight=w,\n",
" log_weight=float(np.log1p(w)),\n",
" color=mcolors.to_hex(color_map.get(src)) if color_map.get(src) else \"#999999\",\n",
" label=f\"{src} → {tgt} : {w} mots\",\n",
" )\n",
"\n",
"# 3. Préparation du GraphRenderer Bokeh\n",
"node_names = list(G.nodes())\n",
"node_indices = list(range(len(node_names)))\n",
"name_to_index = {name: i for i, name in enumerate(node_names)}\n",
"\n",
"graph = GraphRenderer()\n",
"\n",
"# ----- NŒUDS -----\n",
"mots_list = [G.nodes[n][\"mots\"] for n in node_names]\n",
"sizes = [G.nodes[n][\"size\"] for n in node_names] # on réutilise la taille déjà calculée\n",
"max_size = max(sizes) if sizes else 1\n",
"\n",
"# radius en coordonnées \"data\" (échelle relative)\n",
"radii = [0.03 + 0.12 * (s / max_size) for s in sizes]\n",
"\n",
"graph.node_renderer.data_source.data = dict(\n",
" index=node_indices,\n",
" name=node_names,\n",
" label=[G.nodes[n][\"label\"] for n in node_names],\n",
" mots=mots_list,\n",
" color=[G.nodes[n][\"color\"] for n in node_names],\n",
" radius=radii,\n",
")\n",
"\n",
"graph.node_renderer.glyph = Circle(\n",
" radius=\"radius\",\n",
" fill_color=\"color\",\n",
" line_width=1.2,\n",
")\n",
"\n",
"# ----- ARÊTES -----\n",
"edge_start = []\n",
"edge_end = []\n",
"edge_weight = []\n",
"edge_log_weight = []\n",
"edge_color = []\n",
"edge_label = []\n",
"\n",
"for u, v in G.edges():\n",
" edge_start.append(name_to_index[u])\n",
" edge_end.append(name_to_index[v])\n",
" edge_weight.append(G[u][v][\"weight\"])\n",
" edge_log_weight.append(G[u][v][\"log_weight\"])\n",
" edge_color.append(G[u][v][\"color\"]) # couleur de la source\n",
" edge_label.append(G[u][v][\"label\"])\n",
"\n",
"# largeur des arêtes normalisée (contraste lisible)\n",
"if edge_log_weight:\n",
" min_log = min(edge_log_weight)\n",
" max_log = max(edge_log_weight)\n",
" if max_log == min_log:\n",
" edge_line_width = [5.0 for _ in edge_log_weight]\n",
" else:\n",
" edge_line_width = [\n",
" 1.0 + 11.0 * (lw - min_log) / (max_log - min_log)\n",
" for lw in edge_log_weight\n",
" ]\n",
"else:\n",
" edge_line_width = []\n",
"\n",
"graph.edge_renderer.data_source.data = dict(\n",
" start=edge_start,\n",
" end=edge_end,\n",
" weight=edge_weight,\n",
" log_weight=edge_log_weight,\n",
" edge_color=edge_color,\n",
" line_width=edge_line_width,\n",
" label=edge_label,\n",
")\n",
"\n",
"graph.edge_renderer.glyph = MultiLine(\n",
" line_color=\"edge_color\",\n",
" line_width=\"line_width\",\n",
" line_alpha=0.25, # « fond » des arêtes\n",
")\n",
"\n",
"# 4. Layout (positions des nœuds)\n",
"\n",
"LAYOUT_SEED = 42 # layout stable d’une exécution à l’autre\n",
"\n",
"pos = nx.spring_layout(\n",
" G,\n",
" k=2,\n",
" iterations=200,\n",
" weight=\"log_weight\",\n",
" seed=LAYOUT_SEED,\n",
")\n",
"\n",
"# Rayons par nœud (en coordonnées data)\n",
"node_radius = dict(zip(node_names, radii))\n",
"\n",
"# Gros personnages = au-dessus du 75e centile\n",
"mots_arr = np.array(mots_list)\n",
"seuil_gros = np.quantile(mots_arr, 0.75) if len(mots_arr) > 0 else 0\n",
"big_nodes = [n for n, m in zip(node_names, mots_list) if m >= seuil_gros]\n",
"\n",
"big_sep_factor = 5.0 # écartement spécifique entre gros nœuds\n",
"\n",
"for _ in range(10):\n",
" moved = False\n",
" for i in range(len(big_nodes)):\n",
" for j in range(i + 1, len(big_nodes)):\n",
" ni = big_nodes[i]\n",
" nj = big_nodes[j]\n",
"\n",
" pi = np.array(pos[ni], dtype=float)\n",
" pj = np.array(pos[nj], dtype=float)\n",
"\n",
" diff = pj - pi\n",
" dist = np.linalg.norm(diff)\n",
" if dist == 0:\n",
" diff = np.random.randn(2)\n",
" dist = np.linalg.norm(diff)\n",
"\n",
" min_dist = big_sep_factor * (node_radius[ni] + node_radius[nj])\n",
"\n",
" if dist < min_dist:\n",
" direction = diff / dist\n",
" center = (pi + pj) / 2.0\n",
" offset = direction * (min_dist / 2.0)\n",
"\n",
" pos[ni] = (center - offset).tolist()\n",
" pos[nj] = (center + offset).tolist()\n",
" moved = True\n",
" if not moved:\n",
" break\n",
"\n",
"graph_layout = {name_to_index[name]: pos[name] for name in node_names}\n",
"graph.layout_provider = StaticLayoutProvider(graph_layout=graph_layout)\n",
"\n",
"xs = [p[0] for p in pos.values()]\n",
"ys = [p[1] for p in pos.values()]\n",
"if xs and ys:\n",
" margin = 0.3 * max(max(xs) - min(xs), max(ys) - min(ys))\n",
" x_min, x_max = min(xs) - margin, max(xs) + margin\n",
" y_min, y_max = min(ys) - margin, max(ys) + margin\n",
"else:\n",
" x_min, x_max, y_min, y_max = -2, 2, -2, 2\n",
"\n",
"# 5. Figure Bokeh + Hover\n",
"\n",
"plot = bkp.figure(\n",
" width=900,\n",
" height=700,\n",
" x_range=(x_min, x_max),\n",
" y_range=(y_min, y_max),\n",
" title=\"Graphe des interlocutions\",\n",
" tools=\"pan,wheel_zoom,box_zoom,reset,save\",\n",
" active_scroll=\"wheel_zoom\",\n",
" background_fill_color=\"#f8f8f8\",\n",
")\n",
"\n",
"plot.renderers.append(graph)\n",
"\n",
"# Hover nœuds\n",
"hover_nodes = HoverTool(\n",
" tooltips=[\n",
" (\"Personnage\", \"@label\"),\n",
" (\"Mots totaux\", \"@mots\"),\n",
" ],\n",
" renderers=[graph.node_renderer],\n",
")\n",
"plot.add_tools(hover_nodes)\n",
"\n",
"# Hover arêtes (corps)\n",
"hover_edges = HoverTool(\n",
" tooltips=[\n",
" (\"Interaction\", \"@label\"),\n",
" (\"Mots\", \"@weight\"),\n",
" ],\n",
" renderers=[graph.edge_renderer],\n",
")\n",
"plot.add_tools(hover_edges)\n",
"\n",
"# 6. Flèches (orientation)\n",
"\n",
"layout_xy = {name: pos[name] for name in node_names}\n",
"\n",
"x_start, y_start, x_end, y_end, lw_arrow, colors = [], [], [], [], [], []\n",
"for (u, v), w, c in zip(G.edges(), edge_line_width, edge_color):\n",
" x0, y0 = layout_xy[u]\n",
" x1, y1 = layout_xy[v]\n",
" x_start.append(x0)\n",
" y_start.append(y0)\n",
" x_end.append(x1)\n",
" y_end.append(y1)\n",
" lw_arrow.append(w)\n",
" colors.append(c) # couleur du personnage source\n",
"\n",
"arrow_source = ColumnDataSource(dict(\n",
" x_start=x_start,\n",
" y_start=y_start,\n",
" x_end=x_end,\n",
" y_end=y_end,\n",
" line_width=lw_arrow,\n",
" color=colors,\n",
"))\n",
"\n",
"arrows = Arrow(\n",
" end=NormalHead(\n",
" size=7,\n",
" fill_color=\"color\",\n",
" line_color=\"color\",\n",
" fill_alpha=1.0,\n",
" line_alpha=1.0,\n",
" ),\n",
" x_start=\"x_start\",\n",
" y_start=\"y_start\",\n",
" x_end=\"x_end\",\n",
" y_end=\"y_end\",\n",
" line_width=\"line_width\",\n",
" line_color=\"color\",\n",
" line_alpha=0.5,\n",
" source=arrow_source,\n",
")\n",
"plot.add_layout(arrows)\n",
"\n",
"# 7. Labels sur les nœuds\n",
"\n",
"node_x = [graph_layout[i][0] for i in node_indices]\n",
"node_y = [graph_layout[i][1] for i in node_indices]\n",
"\n",
"labels_source = ColumnDataSource(dict(\n",
" x=node_x,\n",
" y=node_y,\n",
" label=[G.nodes[n][\"label\"] for n in node_names],\n",
"))\n",
"\n",
"labels = LabelSet(\n",
" x=\"x\",\n",
" y=\"y\",\n",
" text=\"label\",\n",
" source=labels_source,\n",
" text_align=\"center\",\n",
" text_baseline=\"middle\",\n",
" text_font_size=\"9pt\",\n",
" text_color=\"#111111\",\n",
")\n",
"plot.add_layout(labels)\n",
"\n",
"show(plot)\n",
"\n",
"# Version statique\n",
"\n",
"fig, ax = plt.subplots(figsize=(10, 10))\n",
"ax.set_axis_off()\n",
"\n",
"# Nœuds : réutilise la taille et la couleur déjà stockées dans G\n",
"node_sizes = [G.nodes[n][\"size\"] * 20 for n in G.nodes()] # facteur à ajuster visuellement\n",
"node_colors = [G.nodes[n][\"color\"] for n in G.nodes()]\n",
"\n",
"nx.draw_networkx_nodes(\n",
" G,\n",
" pos,\n",
" ax=ax,\n",
" node_size=node_sizes,\n",
" node_color=node_colors,\n",
" linewidths=1.0,\n",
" edgecolors=\"#111111\",\n",
")\n",
"\n",
"# Arêtes : on réutilise largeur et couleur calculées pour Bokeh\n",
"edges = list(G.edges())\n",
"\n",
"nx.draw_networkx_edges(\n",
" G,\n",
" pos,\n",
" ax=ax,\n",
" edgelist=edges,\n",
" width=edge_line_width,\n",
" edge_color=edge_color,\n",
" arrows=True,\n",
" arrowstyle=\"-|>\",\n",
" arrowsize=15,\n",
" alpha=0.5,\n",
")\n",
"\n",
"# Labels : mêmes labels que dans Bokeh\n",
"labels = {n: G.nodes[n][\"label\"] for n in G.nodes()}\n",
"nx.draw_networkx_labels(\n",
" G,\n",
" pos,\n",
" labels=labels,\n",
" ax=ax,\n",
" font_size=9,\n",
" font_color=\"#111111\",\n",
")\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"La pile logicielle employée par l'OBVIL pour son propre graphique repose sur Sigma et ForceAtlas2, des modules `nodejs` que je ne souhaitais pas exploiter dans ce notebook.\n",
"Nous avons choisi d'exploiter la librairie bokeh pour sa popularité et sa maintenance active, ainsi que sa disponibilité dans cette instance de Jupyter Lab."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Conclusion\n",
"\n",
"Cette étude a révélé que malgré tout le soin que l'on peut apporter au traitement d'un sujet spécifique, il est possible d'introduire involontairement des \"artefacts\" dans les données sur lesquelles on travaille. \n",
"Ici, il s'agit de différences d'orthographes mineures, invisibles lors d'une lecture par un humain, mais qui peuvent rendre une étude assistée par l'informatique plus complexe, voire sujette aux erreurs.\n",
"On le répète assez souvent en informatique (et particulièrement en développement web) : on ne doit jamais avoir confiance dans les entrées...\n",
"Cela a nécessité un travail assez important en amont, et il faut préciser que ce n'est pas infaillible.\n",
"\n",
"Une autre source potentielle d'erreur, que ce soit au niveau de l'analyse ou de l'interprétation, consiste en des définitions ou des méthodologies différentes.\n",
"Ici, nous avons calculé des statistiques divergentes de celles de l'OBVIL, probablement en raison d'une définition différente de ce qu'est une \"ligne\".\n",
"Pour notre étude, nous considérons qu'une ligne est une suite de 60 caractères, dont les espaces surnuméraires ont été supprimés (incluant les sauts de lignes).\n",
"\n",
"Or, ce n'est qu'après une inspection plus poussée du dépôt de l'outil [dramagraph](https://github.com/dramacode/dramagraph/tree/gh-pages) de l'OBLIV révèle la méthode : après \"nettoyage\" des fichiers TEI source par l'emploi d'une [feuille de style XSL](https://github.com/dramacode/dramagraph/blob/gh-pages/naked.xsl), [un autre fichier XSL](https://github.com/dramacode/dramagraph/blob/gh-pages/drama2csv.xsl#L517) est en charge, notamment, du formatage des paragraphes (séparés par des retours de ligne), de la [gestion des accents](https://github.com/dramacode/dramagraph/blob/gh-pages/drama2csv.xsl#L14) et de [la casse](https://github.com/dramacode/dramagraph/blob/gh-pages/drama2csv.xsl#L524), et enfin de la production [des compteurs](https://github.com/dramacode/dramagraph/blob/gh-pages/drama2csv.xsl#L490).\n",
"\n",
"Par conséquent, pour retrouver des statistiques identiques à l'OBVIL, il aurait fallut passer par le même _pipeline_.\n",
"On aurait du choisir d'utiliser le TEI comme fichier source et lui appliquer les mêmes fichiers XSL, ce qui nous aurait donné directement accès aux statistiques, sans avoir besoin de les recalculer nous-même, réduisant considérablement la taille de ce notebook.\n",
"\n",
"On en déduit finalement que :\n",
"\n",
"- l'enthousiasme est parfois un ennemi ! J'aurai du prendre davantage de temps pour examiner comment l'OBVIL a produit ses statistiques, avant de me lancer dans une étude personnelle\n",
"- bien que structuré, HTML est un \"produit transformé\" : les fichiers TEI ont manifestement servi comme base à tous les autres formats proposés par l'OBVIL ; j'aurai du l'identifier comme source idéale"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}