{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Objectif : référencer quelques sites internets fournissant des jeux de données \"datasets\" sur les véhicules autonomes\n", "\n", "- [x] Extraire un résumé, des informations vers un fichier CSV (datasets.csv)\n", "- [x] Lire et afficher les données du fichier CSV pour vérification\n", "- [x] Extraire des mots-clés, des étiquettes décrivant ces datasets (colonne tags)\n", "- [x] Créer quelques statistiques de base de ces datasets\n", "- [x] Créer une représentation graphique de ces datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Lire et afficher les données du fichier CSV" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedescriptionwebsite
0KITTI Vision Benchmark SuiteWe take advantage of our autonomous driving pl...http://www.cvlibs.net/datasets/kitti/
1Audi Autonomous Driving DatasetWe have published the Audi Autonomous Driving ...https://www.a2d2.audi/a2d2/en.html
2ApolloScape DatasetTrajectory dataset, 3D Perception Lidar Object...http://apolloscape.auto/
3Velodyne SLAMHere, you can find two challenging datasets re...http://www.mrt.kit.edu/z/publ/download/velodyn...
4Daimler Urban Segmentation DatasetThe Daimler Urban Segmentation Dataset consist...http://www.6d-vision.com/scene-labeling
5nuScenes datasetThe nuScenes dataset is a public large-scale d...https://www.nuscenes.org/
\n", "
" ], "text/plain": [ " name \\\n", "0 KITTI Vision Benchmark Suite \n", "1 Audi Autonomous Driving Dataset \n", "2 ApolloScape Dataset \n", "3 Velodyne SLAM \n", "4 Daimler Urban Segmentation Dataset \n", "5 nuScenes dataset \n", "\n", " description \\\n", "0 We take advantage of our autonomous driving pl... \n", "1 We have published the Audi Autonomous Driving ... \n", "2 Trajectory dataset, 3D Perception Lidar Object... \n", "3 Here, you can find two challenging datasets re... \n", "4 The Daimler Urban Segmentation Dataset consist... \n", "5 The nuScenes dataset is a public large-scale d... \n", "\n", " website \n", "0 http://www.cvlibs.net/datasets/kitti/ \n", "1 https://www.a2d2.audi/a2d2/en.html \n", "2 http://apolloscape.auto/ \n", "3 http://www.mrt.kit.edu/z/publ/download/velodyn... \n", "4 http://www.6d-vision.com/scene-labeling \n", "5 https://www.nuscenes.org/ " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Le fichier datasets.csv est dans le dossier module2/exo4/datasets.csv\n", "# Structure de l'entête/données : name;description;website;tags (avec des points virgules pour les champs)\n", "\n", "# https://pandas.pydata.org\n", "# Version 0.22.0 (December 29, 2017) sur ce Jupyter !\n", "import pandas as pd\n", "\n", "# print(pd.__version__) \n", "# pd.show_versions() # Toutes les extensions installées\n", "\n", "# Afficher les colonnes principales\n", "datasets = pd.read_csv('datasets.csv', delimiter = ';', usecols=[0,1,2])\n", "datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Extraire des mots-clés, des étiquettes décrivant ces datasets" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tags
0stereo,flow,odometry,tracking,detection,road,m...
1semantic,cloud,segmentation,detection,road,map...
2stereo,flow,semantic,cloud,segmentation,detect...
3detection,images,city
4stereo,labelling,detection,road,maps,city
5labelling,detection,road,maps,city
\n", "
" ], "text/plain": [ " tags\n", "0 stereo,flow,odometry,tracking,detection,road,m...\n", "1 semantic,cloud,segmentation,detection,road,map...\n", "2 stereo,flow,semantic,cloud,segmentation,detect...\n", "3 detection,images,city\n", "4 stereo,labelling,detection,road,maps,city\n", "5 labelling,detection,road,maps,city" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "# Afficher uniquement les mots-clés \n", "tags = pd.read_csv('datasets.csv', delimiter = ';', usecols=[3])\n", "tags" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Créer quelques statistiques de base de ces datasets" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tags
count6
unique6
toplabelling,detection,road,maps,city
freq1
\n", "
" ], "text/plain": [ " tags\n", "count 6\n", "unique 6\n", "top labelling,detection,road,maps,city\n", "freq 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "# Afficher les mots-clés avec .describe()\n", "tags = pd.read_csv('datasets.csv', delimiter = ';', usecols=[3])\n", "tags.describe()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "city : 6 \n", "cloud : 2 \n", "detection : 6 \n", "flow : 2 \n", "images : 1 \n", "labelling : 2 \n", "maps : 5 \n", "odometry : 1 \n", "road : 5 \n", "segmentation : 2 \n", "semantic : 2 \n", "stereo : 3 \n", "tracking : 1 \n" ] } ], "source": [ "import pandas as pd\n", "# https://docs.python.org/3/library/re.html\n", "# https://www.w3schools.com/python/python_regex.asp\n", "import re\n", "\n", "all_tags = {}\n", "# Extraire les mots-clés\n", "tags = pd.read_csv('datasets.csv', delimiter = ';', usecols=[3])\n", "\n", "for t in tags.values:\n", " # t = numpy.ndarray par défaut\n", " split = re.split(',',str(t[0]).lower())\n", " # Comptage des mots-clés\n", " for s in split:\n", " if all_tags.get(s):\n", " all_tags[s] = all_tags[s] + 1\n", " else:\n", " all_tags[s] = 1\n", "\n", "# Classement par le nom via un dictionnaire\n", "sorted_all_tags = {key: value for key, value in sorted(all_tags.items(), key=lambda item: item[0])}\n", "\n", "# Affichage formaté\n", "for sat in sorted_all_tags:\n", " print(\"{:<15} : {:<5}\".format(sat, sorted_all_tags[sat]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Créer une représentation graphique de ces datasets" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# https://matplotlib.org/api/pyplot_api.html\n", "# Version 2.2.3 sur ce Jupyter !\n", "import matplotlib.pyplot as plt\n", "\n", "# Personnalisation du graphique\n", "# https://www.tutorialgateway.org/python-matplotlib-bar-chart/\n", "# https://python-graph-gallery.com/\n", "# https://www.tutorialspoint.com/matplotlib/matplotlib_bar_plot.htm\n", "plt.style.use('seaborn-whitegrid')\n", "plt.suptitle('All tags from the datasets CSV file', fontsize=14)\n", "plt.title('Feel free to add new tags...')\n", "plt.xlabel('Tag names')\n", "plt.ylabel('Tag numbers')\n", "\n", "plt.bar(range(len(sorted_all_tags)), sorted_all_tags.values(), width=0.8)\n", "plt.xticks(range(len(sorted_all_tags)), list(sorted_all_tags.keys()))\n", "plt.xticks(rotation = 45, horizontalalignment = 'right')\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }