{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Estimation de la latence et de la capacité d’une connexion à partir de mesures asymétriques" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ce document correspond au Sujet 4 du module 3 du MOOC \"Recherche reproductible : principes méthodologiques pour une science transparente\".\n", "\n", "On s'intéresse à comparer un modèle simple de la performance d'une connexion de réseau avec des données réelles." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "import os\n", "from os.path import exists\n", "import requests\n", "import gzip" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Téléchargement des données" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On s'intéresse à deux jeux de données hébergés en ligne. La première étape est de les télécharger en local." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Noms des fichiers en local\n", "filenames = [\n", " \"liglab2.log\", \n", " \"stackoverflow.log\",\n", "]\n", "# Adresse où les fichiers sont hébergés\n", "urls = [\n", " \"http://mescal.imag.fr/membres/arnaud.legrand/teaching/2014/RICM4_EP_ping/liglab2.log.gz\",\n", " \"http://mescal.imag.fr/membres/arnaud.legrand/teaching/2014/RICM4_EP_ping/stackoverflow.log.gz\",\n", "]" ] }, { "cell_type": "raw", "metadata": { "hideOutput": true }, "source": [ "# Facultatif : suppression des fichiers pour forcer le re-téléchargement\n", "for filename in filenames:\n", " try:\n", " os.remove(filename)\n", " except FileNotFoundError:\n", " pass" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Le fichier liglab2.log existe déjà, pas besoin de le télécharger.\n", "Le fichier stackoverflow.log existe déjà, pas besoin de le télécharger.\n" ] } ], "source": [ "# Si les fichiers n'existent pas encore, on les télécharge.\n", "\n", "def download_archive(filename, url):\n", " if not exists(filename):\n", " # On utilise le module requests pour récupérer les données en ligne\n", " archive = requests.get(url)\n", " # Le fichier est une archive .gz, on l'extrait avec le module gzip\n", " content = gzip.decompress(archive.content)\n", " \n", " open(filename,'wb').write(content)\n", " print(f\"Téléchargement de {url} et extraction vers {filename}.\")\n", " else:\n", " print(f\"Le fichier {filename} existe déjà, pas besoin de le télécharger.\")\n", "\n", "\n", "for filename, url in zip(filenames, urls):\n", " download_archive(filename, url)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lecture des données\n", "On extrait maintenant les données de l'outil `ping` sous forme d'un tableau `pandas`.\n", "\n", "Le format étant relativement simple, il est possible de le faire en utilisant uniquement les fonctions de base des chaînes de caractères de Python.\n", "\n", "Chaque ligne a la forme suivante:\n", "```\n", "[1421761682.052172] 665 bytes from lig-publig.imag.fr (129.88.11.7): icmp_seq=1 ttl=60 time=22.5 ms\n", "```\n", "On extrait uniquement les données qui nous intéressent :\n", "\n", " * la date de mesure (en secondes depuis le 1er janvier 1970) du 2e au 18e caractère\n", " * la taille du message (en octets), qui est suivi de la sous-chaîne `\" bytes\"`\n", " * la durée de réponse (en millisecondes), qui est précédé de `\"time=\"` et suivi de `\" ms\"`" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "44036 lignes lues avec succès, (0.85% d'échecs)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
msgsizetimetimestamp
066522.502015-01-20 13:48:02.052172
1137321.202015-01-20 13:48:02.277315
226221.202015-01-20 13:48:02.502054
3110723.302015-01-20 13:48:02.729257
411281.412015-01-20 13:48:02.934648
548921.902015-01-20 13:48:03.160397
6175978.702015-01-20 13:48:03.443055
7114625.102015-01-20 13:48:03.672157
888424.002015-01-20 13:48:03.899933
9142219.502015-01-20 13:48:04.122687
10118018.002015-01-20 13:48:04.344135
1199918.802015-01-20 13:48:04.566271
12102024.302015-01-20 13:48:04.998504
13713.452015-01-20 13:48:05.205172
14345.852015-01-20 13:48:05.414106
1518432.312015-01-20 13:48:05.620117
164071.142015-01-20 13:48:05.824949
173561.102015-01-20 13:48:06.029177
1815112.182015-01-20 13:48:06.234464
195871.272015-01-20 13:48:06.438772
208091.332015-01-20 13:48:06.643208
2113641.512015-01-20 13:48:06.848323
2211531.442015-01-20 13:48:07.053400
238531.302015-01-20 13:48:07.257704
2415102.172015-01-20 13:48:07.463275
251231.212015-01-20 13:48:07.668423
2619662.202015-01-20 13:48:07.874230
279331.342015-01-20 13:48:08.078667
289221.422015-01-20 13:48:08.283655
29241.122015-01-20 13:48:08.488688
............
44006177228.802015-01-20 16:26:20.743715
44007411.142015-01-20 16:26:20.949053
4400819442.322015-01-20 16:26:21.155685
440094001.982015-01-20 16:26:21.362095
440102263.012015-01-20 16:26:21.569409
440114667.452015-01-20 16:26:21.780805
4401235013.502015-01-20 16:26:21.998869
44013182945.902015-01-20 16:26:22.248969
44014195458.502015-01-20 16:26:22.512386
4401510741.452015-01-20 16:26:22.717961
44016461.112015-01-20 16:26:22.923292
4401718442.262015-01-20 16:26:23.129965
440186451.242015-01-20 16:26:23.335449
440194441.252015-01-20 16:26:23.540901
4402019402.462015-01-20 16:26:23.747983
4402114111.472015-01-20 16:26:23.954099
44022491.212015-01-20 16:26:24.159879
440234201.552015-01-20 16:26:24.365815
440242271.222015-01-20 16:26:24.571516
440259471.342015-01-20 16:26:24.777325
4402619602.432015-01-20 16:26:24.983905
440275311.192015-01-20 16:26:25.188976
440283741.142015-01-20 16:26:25.394275
4402915032.192015-01-20 16:26:25.600745
440305721.292015-01-20 16:26:25.805877
4403113381.472015-01-20 16:26:26.011910
4403215157.022015-01-20 16:26:26.222729
4403318752.332015-01-20 16:26:26.429007
4403410061.612015-01-20 16:26:26.634747
4403512731.352015-01-20 16:26:26.840222
\n", "

44036 rows × 3 columns

\n", "
" ], "text/plain": [ " msgsize time timestamp\n", "0 665 22.50 2015-01-20 13:48:02.052172\n", "1 1373 21.20 2015-01-20 13:48:02.277315\n", "2 262 21.20 2015-01-20 13:48:02.502054\n", "3 1107 23.30 2015-01-20 13:48:02.729257\n", "4 1128 1.41 2015-01-20 13:48:02.934648\n", "5 489 21.90 2015-01-20 13:48:03.160397\n", "6 1759 78.70 2015-01-20 13:48:03.443055\n", "7 1146 25.10 2015-01-20 13:48:03.672157\n", "8 884 24.00 2015-01-20 13:48:03.899933\n", "9 1422 19.50 2015-01-20 13:48:04.122687\n", "10 1180 18.00 2015-01-20 13:48:04.344135\n", "11 999 18.80 2015-01-20 13:48:04.566271\n", "12 1020 24.30 2015-01-20 13:48:04.998504\n", "13 71 3.45 2015-01-20 13:48:05.205172\n", "14 34 5.85 2015-01-20 13:48:05.414106\n", "15 1843 2.31 2015-01-20 13:48:05.620117\n", "16 407 1.14 2015-01-20 13:48:05.824949\n", "17 356 1.10 2015-01-20 13:48:06.029177\n", "18 1511 2.18 2015-01-20 13:48:06.234464\n", "19 587 1.27 2015-01-20 13:48:06.438772\n", "20 809 1.33 2015-01-20 13:48:06.643208\n", "21 1364 1.51 2015-01-20 13:48:06.848323\n", "22 1153 1.44 2015-01-20 13:48:07.053400\n", "23 853 1.30 2015-01-20 13:48:07.257704\n", "24 1510 2.17 2015-01-20 13:48:07.463275\n", "25 123 1.21 2015-01-20 13:48:07.668423\n", "26 1966 2.20 2015-01-20 13:48:07.874230\n", "27 933 1.34 2015-01-20 13:48:08.078667\n", "28 922 1.42 2015-01-20 13:48:08.283655\n", "29 24 1.12 2015-01-20 13:48:08.488688\n", "... ... ... ...\n", "44006 1772 28.80 2015-01-20 16:26:20.743715\n", "44007 41 1.14 2015-01-20 16:26:20.949053\n", "44008 1944 2.32 2015-01-20 16:26:21.155685\n", "44009 400 1.98 2015-01-20 16:26:21.362095\n", "44010 226 3.01 2015-01-20 16:26:21.569409\n", "44011 466 7.45 2015-01-20 16:26:21.780805\n", "44012 350 13.50 2015-01-20 16:26:21.998869\n", "44013 1829 45.90 2015-01-20 16:26:22.248969\n", "44014 1954 58.50 2015-01-20 16:26:22.512386\n", "44015 1074 1.45 2015-01-20 16:26:22.717961\n", "44016 46 1.11 2015-01-20 16:26:22.923292\n", "44017 1844 2.26 2015-01-20 16:26:23.129965\n", "44018 645 1.24 2015-01-20 16:26:23.335449\n", "44019 444 1.25 2015-01-20 16:26:23.540901\n", "44020 1940 2.46 2015-01-20 16:26:23.747983\n", "44021 1411 1.47 2015-01-20 16:26:23.954099\n", "44022 49 1.21 2015-01-20 16:26:24.159879\n", "44023 420 1.55 2015-01-20 16:26:24.365815\n", "44024 227 1.22 2015-01-20 16:26:24.571516\n", "44025 947 1.34 2015-01-20 16:26:24.777325\n", "44026 1960 2.43 2015-01-20 16:26:24.983905\n", "44027 531 1.19 2015-01-20 16:26:25.188976\n", "44028 374 1.14 2015-01-20 16:26:25.394275\n", "44029 1503 2.19 2015-01-20 16:26:25.600745\n", "44030 572 1.29 2015-01-20 16:26:25.805877\n", "44031 1338 1.47 2015-01-20 16:26:26.011910\n", "44032 1515 7.02 2015-01-20 16:26:26.222729\n", "44033 1875 2.33 2015-01-20 16:26:26.429007\n", "44034 1006 1.61 2015-01-20 16:26:26.634747\n", "44035 1273 1.35 2015-01-20 16:26:26.840222\n", "\n", "[44036 rows x 3 columns]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def extract_data(filename):\n", " timestamps = []\n", " msgsizes = []\n", " times = []\n", " err_count = 0\n", " success_count = 0\n", " \n", " with open(filename, 'r') as file:\n", " for line in file.readlines():\n", " try:\n", " # Date de mesure (timestamp)\n", " # du caractère n°1 au n°17 inclus (numérotés à partir de 0)\n", " ts_str = line[1:18]\n", " ts_float = float(ts_str)\n", " # On convertit en date pandas\n", " ts = pd.Timestamp(ts_float, unit='s')\n", "\n", " # Taille du message (message size)\n", " ms_str = line[20:line.index(\" bytes\")]\n", " ms_int = int(ms_str)\n", "\n", " # Durée de l'échange (time)\n", " time_str = line[line.index('time=')+5:line.rindex(\" ms\")]\n", " time_float = float(time_str)\n", " \n", " # Une fois les valeurs trouvées, on les ajoute au tableau\n", " timestamps.append(ts)\n", " msgsizes.append(ms_int)\n", " times.append(time_float)\n", " success_count += 1\n", " \n", " except ValueError:\n", " # Lorsqu'il manque l'une des valeurs, on oublie la ligne correspondante\n", " err_count += 1\n", " \n", " total_count = success_count + err_count\n", " print(f\"{success_count} lignes lues avec succès ({100*err_count/total_count:.2f}% d'échecs)\")\n", " return pd.DataFrame({\"timestamp\":timestamps, \"msgsize\":msgsizes, \"time\":times})\n", " \n", "\n", "extract_data(filenames[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }