{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Estimation de la latence et de la capacité d’une connexion à partir de mesures asymétriques" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ce document correspond au Sujet 4 du module 3 du MOOC \"Recherche reproductible : principes méthodologiques pour une science transparente\".\n", "\n", "On s'intéresse à comparer un modèle simple de la performance d'une connexion de réseau avec des données réelles." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import os\n", "from os.path import exists\n", "import requests\n", "import gzip\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "from sklearn.linear_model import LinearRegression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Téléchargement des données" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On s'intéresse à deux jeux de données hébergés en ligne. La première étape est de les télécharger en local." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Noms des fichiers en local\n", "filenames = [\n", " \"liglab2.log\", \n", " \"stackoverflow.log\",\n", "]\n", "# Adresse où les fichiers sont hébergés\n", "urls = [\n", " \"http://mescal.imag.fr/membres/arnaud.legrand/teaching/2014/RICM4_EP_ping/liglab2.log.gz\",\n", " \"http://mescal.imag.fr/membres/arnaud.legrand/teaching/2014/RICM4_EP_ping/stackoverflow.log.gz\",\n", "]" ] }, { "cell_type": "raw", "metadata": { "hideOutput": true }, "source": [ "# Facultatif : suppression des fichiers pour forcer le re-téléchargement\n", "# Changer cette cellule en \"Code\"\n", "\n", "for filename in filenames:\n", " try:\n", " os.remove(filename)\n", " except FileNotFoundError:\n", " pass" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Le fichier liglab2.log existe déjà, pas besoin de le télécharger.\n", "Le fichier stackoverflow.log existe déjà, pas besoin de le télécharger.\n" ] } ], "source": [ "# Si les fichiers n'existent pas encore, on les télécharge.\n", "\n", "def download_archive(filename, url):\n", " if not exists(filename):\n", " # On utilise le module requests pour récupérer les données en ligne\n", " archive = requests.get(url)\n", " # Le fichier est une archive .gz, on l'extrait avec le module gzip\n", " content = gzip.decompress(archive.content)\n", " \n", " open(filename,'wb').write(content)\n", " print(f\"Téléchargement de {url} et extraction vers {filename}.\")\n", " else:\n", " print(f\"Le fichier {filename} existe déjà, pas besoin de le télécharger.\")\n", "\n", "\n", "for filename, url in zip(filenames, urls):\n", " download_archive(filename, url)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lecture des données\n", "On extrait maintenant les données de l'outil `ping` sous forme d'un tableau `pandas`.\n", "\n", "Le format étant relativement simple, il est possible de le faire en utilisant uniquement les fonctions de base des chaînes de caractères de Python.\n", "\n", "Chaque ligne a la forme suivante:\n", "```\n", "[1421761682.052172] 665 bytes from lig-publig.imag.fr (129.88.11.7): icmp_seq=1 ttl=60 time=22.5 ms\n", "```\n", "On extrait uniquement les données qui nous intéressent :\n", "\n", " * la date de mesure (en secondes depuis le 1er janvier 1970) du 2e au 18e caractère\n", " * la taille du message (en octets), qui est suivi de la sous-chaîne `\" bytes\"`\n", " * la durée de réponse (en millisecondes), qui est précédé de `\"time=\"` et suivi de `\" ms\"`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "44036 lignes lues avec succès (0.85% d'échecs)\n" ] }, { "data": { "text/html": [ "
\n", " | msgsize | \n", "time | \n", "timestamp | \n", "
---|---|---|---|
0 | \n", "665 | \n", "22.50 | \n", "2015-01-20 13:48:02.052172 | \n", "
1 | \n", "1373 | \n", "21.20 | \n", "2015-01-20 13:48:02.277315 | \n", "
2 | \n", "262 | \n", "21.20 | \n", "2015-01-20 13:48:02.502054 | \n", "
3 | \n", "1107 | \n", "23.30 | \n", "2015-01-20 13:48:02.729257 | \n", "
4 | \n", "1128 | \n", "1.41 | \n", "2015-01-20 13:48:02.934648 | \n", "
5 | \n", "489 | \n", "21.90 | \n", "2015-01-20 13:48:03.160397 | \n", "
6 | \n", "1759 | \n", "78.70 | \n", "2015-01-20 13:48:03.443055 | \n", "
7 | \n", "1146 | \n", "25.10 | \n", "2015-01-20 13:48:03.672157 | \n", "
8 | \n", "884 | \n", "24.00 | \n", "2015-01-20 13:48:03.899933 | \n", "
9 | \n", "1422 | \n", "19.50 | \n", "2015-01-20 13:48:04.122687 | \n", "
10 | \n", "1180 | \n", "18.00 | \n", "2015-01-20 13:48:04.344135 | \n", "
11 | \n", "999 | \n", "18.80 | \n", "2015-01-20 13:48:04.566271 | \n", "
12 | \n", "1020 | \n", "24.30 | \n", "2015-01-20 13:48:04.998504 | \n", "
13 | \n", "71 | \n", "3.45 | \n", "2015-01-20 13:48:05.205172 | \n", "
14 | \n", "34 | \n", "5.85 | \n", "2015-01-20 13:48:05.414106 | \n", "
15 | \n", "1843 | \n", "2.31 | \n", "2015-01-20 13:48:05.620117 | \n", "
16 | \n", "407 | \n", "1.14 | \n", "2015-01-20 13:48:05.824949 | \n", "
17 | \n", "356 | \n", "1.10 | \n", "2015-01-20 13:48:06.029177 | \n", "
18 | \n", "1511 | \n", "2.18 | \n", "2015-01-20 13:48:06.234464 | \n", "
19 | \n", "587 | \n", "1.27 | \n", "2015-01-20 13:48:06.438772 | \n", "
20 | \n", "809 | \n", "1.33 | \n", "2015-01-20 13:48:06.643208 | \n", "
21 | \n", "1364 | \n", "1.51 | \n", "2015-01-20 13:48:06.848323 | \n", "
22 | \n", "1153 | \n", "1.44 | \n", "2015-01-20 13:48:07.053400 | \n", "
23 | \n", "853 | \n", "1.30 | \n", "2015-01-20 13:48:07.257704 | \n", "
24 | \n", "1510 | \n", "2.17 | \n", "2015-01-20 13:48:07.463275 | \n", "
25 | \n", "123 | \n", "1.21 | \n", "2015-01-20 13:48:07.668423 | \n", "
26 | \n", "1966 | \n", "2.20 | \n", "2015-01-20 13:48:07.874230 | \n", "
27 | \n", "933 | \n", "1.34 | \n", "2015-01-20 13:48:08.078667 | \n", "
28 | \n", "922 | \n", "1.42 | \n", "2015-01-20 13:48:08.283655 | \n", "
29 | \n", "24 | \n", "1.12 | \n", "2015-01-20 13:48:08.488688 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
44006 | \n", "1772 | \n", "28.80 | \n", "2015-01-20 16:26:20.743715 | \n", "
44007 | \n", "41 | \n", "1.14 | \n", "2015-01-20 16:26:20.949053 | \n", "
44008 | \n", "1944 | \n", "2.32 | \n", "2015-01-20 16:26:21.155685 | \n", "
44009 | \n", "400 | \n", "1.98 | \n", "2015-01-20 16:26:21.362095 | \n", "
44010 | \n", "226 | \n", "3.01 | \n", "2015-01-20 16:26:21.569409 | \n", "
44011 | \n", "466 | \n", "7.45 | \n", "2015-01-20 16:26:21.780805 | \n", "
44012 | \n", "350 | \n", "13.50 | \n", "2015-01-20 16:26:21.998869 | \n", "
44013 | \n", "1829 | \n", "45.90 | \n", "2015-01-20 16:26:22.248969 | \n", "
44014 | \n", "1954 | \n", "58.50 | \n", "2015-01-20 16:26:22.512386 | \n", "
44015 | \n", "1074 | \n", "1.45 | \n", "2015-01-20 16:26:22.717961 | \n", "
44016 | \n", "46 | \n", "1.11 | \n", "2015-01-20 16:26:22.923292 | \n", "
44017 | \n", "1844 | \n", "2.26 | \n", "2015-01-20 16:26:23.129965 | \n", "
44018 | \n", "645 | \n", "1.24 | \n", "2015-01-20 16:26:23.335449 | \n", "
44019 | \n", "444 | \n", "1.25 | \n", "2015-01-20 16:26:23.540901 | \n", "
44020 | \n", "1940 | \n", "2.46 | \n", "2015-01-20 16:26:23.747983 | \n", "
44021 | \n", "1411 | \n", "1.47 | \n", "2015-01-20 16:26:23.954099 | \n", "
44022 | \n", "49 | \n", "1.21 | \n", "2015-01-20 16:26:24.159879 | \n", "
44023 | \n", "420 | \n", "1.55 | \n", "2015-01-20 16:26:24.365815 | \n", "
44024 | \n", "227 | \n", "1.22 | \n", "2015-01-20 16:26:24.571516 | \n", "
44025 | \n", "947 | \n", "1.34 | \n", "2015-01-20 16:26:24.777325 | \n", "
44026 | \n", "1960 | \n", "2.43 | \n", "2015-01-20 16:26:24.983905 | \n", "
44027 | \n", "531 | \n", "1.19 | \n", "2015-01-20 16:26:25.188976 | \n", "
44028 | \n", "374 | \n", "1.14 | \n", "2015-01-20 16:26:25.394275 | \n", "
44029 | \n", "1503 | \n", "2.19 | \n", "2015-01-20 16:26:25.600745 | \n", "
44030 | \n", "572 | \n", "1.29 | \n", "2015-01-20 16:26:25.805877 | \n", "
44031 | \n", "1338 | \n", "1.47 | \n", "2015-01-20 16:26:26.011910 | \n", "
44032 | \n", "1515 | \n", "7.02 | \n", "2015-01-20 16:26:26.222729 | \n", "
44033 | \n", "1875 | \n", "2.33 | \n", "2015-01-20 16:26:26.429007 | \n", "
44034 | \n", "1006 | \n", "1.61 | \n", "2015-01-20 16:26:26.634747 | \n", "
44035 | \n", "1273 | \n", "1.35 | \n", "2015-01-20 16:26:26.840222 | \n", "
44036 rows × 3 columns
\n", "