{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sujet 4 : Estimation de la latence et de la capacité d’une connexion à partir de mesures asymétriques" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On commence par déclarer les bibliothèques utilisés :\n", "\n", "Note : `urllib.request` n'y est pas car elle n'est utilisée que dans le cas où on doit télécharger les données." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import re\n", "import gzip\n", "import time\n", "import pandas\n", "import io\n", "import os\n", "import datetime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connexion courte à l'intérieur d'un campus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On commence par récupérer les données à étudier :" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Les données sont déjà présentes en local.\n" ] } ], "source": [ "def telecharger_fichier_si_necessaire(data_url):\n", "\n", " # Extrait le nom du fichier à partir de l'URL\n", " data_file = data_url[(data_url.rindex(\"/\")+1):]\n", "\n", " # Vérification de l'extention\n", " if data_file[-7:] != \".log.gz\":\n", " raise Exception(\"Le fichier nom de fichier \"+data_file+\" ne finit pas par \\\".log.gz\\\" !\")\n", "\n", " if not os.access(data_file, os.R_OK):\n", " import urllib.request\n", " print(\"Les données n'existent pas en local, on les télécharges.\")\n", " urllib.request.urlretrieve(data_url, data_file)\n", " if os.access(data_file, os.R_OK):\n", " print(\"Fichier récupéré.\")\n", " else:\n", " raise Exception(\"Le fichier n'a pas pu être récupéré !\")\n", " else:\n", " print(\"Les données sont déjà présentes en local.\")\n", "\n", "telecharger_fichier_si_necessaire(\"http://mescal.imag.fr/membres/arnaud.legrand/teaching/2014/RICM4_EP_ping/liglab2.log.gz\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On définit la fonction qui va lire chaque ligne pour en extraire les données. La ligne retournée sera formatée en CSV.\n", "\n", "Comme ce qui nous intéresse est le temps mis pour latence (ou \"ping\") il faut impérativement que celle ci soit présente pour que la ligne soit reconnue, pour les lignes dans ce cas on retournera `Ǹone`.\n", "\n", "Si la ligne est totalement illisible on soulèvera une exception afin d'avertir l'utilisateur qu'il y a des lignes dont le format est illisible par le programme. Ceci est préférable au fait de retourner `Ǹone` car si c'était le cas on risquerait de masquer des données utiles, par exemple si `ping` avait retourné des données en secondes plutôt qu'en millisecondes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1421761682.052172,665,22.5\n", "\n", "None\n", "\n", "Exception (attendue) : La ligne \"[1421761682.052172] 665 bytes from lig-publig.imag.fr (129.88.11.7): icmp_seq=1 ttl=60 time=22.5 s\" n'est pas dans le format attendu.\n" ] } ], "source": [ "extractDataFromLineRegExp = re.compile(\"^\\[([0-9\\.]+)\\] ([0-9]+) bytes[^:]*: icmp_seq=[0-9]+ ttl=[0-9]+( time=([0-9\\.]+) ms)?$\")\n", "def extractDataFromLine(line):\n", " match = extractDataFromLineRegExp.match(line)\n", " if match and match[4]:\n", " return match[1]+\",\"+match[2]+\",\"+match[4]+\"\\n\"\n", " elif match:\n", " return None\n", " else:\n", " raise Exception(\"La ligne \\\"\"+line+\"\\\" n'est pas dans le format attendu.\")\n", "\n", "# Quelques essais\n", "print(extractDataFromLine(\"[1421761682.052172] 665 bytes from lig-publig.imag.fr (129.88.11.7): icmp_seq=1 ttl=60 time=22.5 ms\")) # Le retour à la ligne est inclus dans ce qui est retourné\n", "print(extractDataFromLine(\"[1421773281.582445] 13 bytes from stackoverflow.com (198.252.206.140): icmp_seq=1 ttl=50\"))\n", "print()\n", "try:\n", " print(extractDataFromLine(\"[1421761682.052172] 665 bytes from lig-publig.imag.fr (129.88.11.7): icmp_seq=1 ttl=60 time=22.5 s\"))\n", " print(\"On devrait avoir une exception ici.\")\n", "except Exception as e:\n", " print(\"Exception (attendue) : \"+e.args[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lit les données du fichier, utilise la fonction `extractDataFromLine` définit précédemment pour extraire les données et les placer dans une variables `csv_data` qui contiendra les données au format CSV.\n", "\n", "J'ai dans un premier temps essayé de ne pas passer par une variable intermédiaire et ajouter les données directement dans le DataFrame mais c'était extrêmement lent. Il aurait aussi été possible de passer par un fichier intermédiaire. En cas de données plus imposantes cela aurait été nécessaire." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lu 44413 lignes en 0.187 sec\n" ] } ], "source": [ "def convertit_fichier_en_csv(data_file):\n", " nb = 0\n", " start_time = time.time()\n", " data = '\"date\",\"size\",\"time\"\\n' # La première ligne du CSV à les noms de champs\n", " with gzip.open(data_file, 'rb') as file:\n", " for line in file:\n", " line_data = extractDataFromLine(line.decode('utf-8').strip())\n", " if line_data:\n", " data += line_data\n", " nb += 1\n", "\n", " print (\"Lu %d lignes en %.3f sec\" % (nb, time.time() - start_time))\n", " return data\n", "\n", "csv_data = convertit_fichier_en_csv(\"liglab2.log.gz\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Traduit le tableau du format CSV en temps que DataFrame pandas. Comme on lit depuis le contenu d'une variable on utilise `io.StringIO` qui permet de lire une variable comme on lit un fichier." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | date | \n", "size | \n", "time | \n", "
---|---|---|---|
0 | \n", "1.421762e+09 | \n", "665 | \n", "22.50 | \n", "
1 | \n", "1.421762e+09 | \n", "1373 | \n", "21.20 | \n", "
2 | \n", "1.421762e+09 | \n", "262 | \n", "21.20 | \n", "
3 | \n", "1.421762e+09 | \n", "1107 | \n", "23.30 | \n", "
4 | \n", "1.421762e+09 | \n", "1128 | \n", "1.41 | \n", "
5 | \n", "1.421762e+09 | \n", "489 | \n", "21.90 | \n", "
6 | \n", "1.421762e+09 | \n", "1759 | \n", "78.70 | \n", "
7 | \n", "1.421762e+09 | \n", "1146 | \n", "25.10 | \n", "
8 | \n", "1.421762e+09 | \n", "884 | \n", "24.00 | \n", "
9 | \n", "1.421762e+09 | \n", "1422 | \n", "19.50 | \n", "
10 | \n", "1.421762e+09 | \n", "1180 | \n", "18.00 | \n", "
11 | \n", "1.421762e+09 | \n", "999 | \n", "18.80 | \n", "
12 | \n", "1.421762e+09 | \n", "1020 | \n", "24.30 | \n", "
13 | \n", "1.421762e+09 | \n", "71 | \n", "3.45 | \n", "
14 | \n", "1.421762e+09 | \n", "34 | \n", "5.85 | \n", "
15 | \n", "1.421762e+09 | \n", "1843 | \n", "2.31 | \n", "
16 | \n", "1.421762e+09 | \n", "407 | \n", "1.14 | \n", "
17 | \n", "1.421762e+09 | \n", "356 | \n", "1.10 | \n", "
18 | \n", "1.421762e+09 | \n", "1511 | \n", "2.18 | \n", "
19 | \n", "1.421762e+09 | \n", "587 | \n", "1.27 | \n", "
20 | \n", "1.421762e+09 | \n", "809 | \n", "1.33 | \n", "
21 | \n", "1.421762e+09 | \n", "1364 | \n", "1.51 | \n", "
22 | \n", "1.421762e+09 | \n", "1153 | \n", "1.44 | \n", "
23 | \n", "1.421762e+09 | \n", "853 | \n", "1.30 | \n", "
24 | \n", "1.421762e+09 | \n", "1510 | \n", "2.17 | \n", "
25 | \n", "1.421762e+09 | \n", "123 | \n", "1.21 | \n", "
26 | \n", "1.421762e+09 | \n", "1966 | \n", "2.20 | \n", "
27 | \n", "1.421762e+09 | \n", "933 | \n", "1.34 | \n", "
28 | \n", "1.421762e+09 | \n", "922 | \n", "1.42 | \n", "
29 | \n", "1.421762e+09 | \n", "24 | \n", "1.12 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
44006 | \n", "1.421771e+09 | \n", "1772 | \n", "28.80 | \n", "
44007 | \n", "1.421771e+09 | \n", "41 | \n", "1.14 | \n", "
44008 | \n", "1.421771e+09 | \n", "1944 | \n", "2.32 | \n", "
44009 | \n", "1.421771e+09 | \n", "400 | \n", "1.98 | \n", "
44010 | \n", "1.421771e+09 | \n", "226 | \n", "3.01 | \n", "
44011 | \n", "1.421771e+09 | \n", "466 | \n", "7.45 | \n", "
44012 | \n", "1.421771e+09 | \n", "350 | \n", "13.50 | \n", "
44013 | \n", "1.421771e+09 | \n", "1829 | \n", "45.90 | \n", "
44014 | \n", "1.421771e+09 | \n", "1954 | \n", "58.50 | \n", "
44015 | \n", "1.421771e+09 | \n", "1074 | \n", "1.45 | \n", "
44016 | \n", "1.421771e+09 | \n", "46 | \n", "1.11 | \n", "
44017 | \n", "1.421771e+09 | \n", "1844 | \n", "2.26 | \n", "
44018 | \n", "1.421771e+09 | \n", "645 | \n", "1.24 | \n", "
44019 | \n", "1.421771e+09 | \n", "444 | \n", "1.25 | \n", "
44020 | \n", "1.421771e+09 | \n", "1940 | \n", "2.46 | \n", "
44021 | \n", "1.421771e+09 | \n", "1411 | \n", "1.47 | \n", "
44022 | \n", "1.421771e+09 | \n", "49 | \n", "1.21 | \n", "
44023 | \n", "1.421771e+09 | \n", "420 | \n", "1.55 | \n", "
44024 | \n", "1.421771e+09 | \n", "227 | \n", "1.22 | \n", "
44025 | \n", "1.421771e+09 | \n", "947 | \n", "1.34 | \n", "
44026 | \n", "1.421771e+09 | \n", "1960 | \n", "2.43 | \n", "
44027 | \n", "1.421771e+09 | \n", "531 | \n", "1.19 | \n", "
44028 | \n", "1.421771e+09 | \n", "374 | \n", "1.14 | \n", "
44029 | \n", "1.421771e+09 | \n", "1503 | \n", "2.19 | \n", "
44030 | \n", "1.421771e+09 | \n", "572 | \n", "1.29 | \n", "
44031 | \n", "1.421771e+09 | \n", "1338 | \n", "1.47 | \n", "
44032 | \n", "1.421771e+09 | \n", "1515 | \n", "7.02 | \n", "
44033 | \n", "1.421771e+09 | \n", "1875 | \n", "2.33 | \n", "
44034 | \n", "1.421771e+09 | \n", "1006 | \n", "1.61 | \n", "
44035 | \n", "1.421771e+09 | \n", "1273 | \n", "1.35 | \n", "
44036 rows × 3 columns
\n", "