{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Objectif : référencer quelques sites internets fournissant des jeux de données \"datasets\" sur les véhicules autonomes\n",
    "\n",
    "- [x] Extraire un résumé, des informations vers un fichier CSV (datasets.csv)\n",
    "- [x] Lire et afficher les données du fichier CSV pour vérification\n",
    "- [x] Extraire des mots-clés, des étiquettes décrivant ces datasets (colonne tags)\n",
    "- [ ] Créer quelques statistiques de base de ces datasets\n",
    "- [ ] Créer une représentation graphique de ces datasets\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Lire et afficher les données du fichier CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>description</th>\n",
       "      <th>website</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KITTI Vision Benchmark Suite</td>\n",
       "      <td>We take advantage of our autonomous driving pl...</td>\n",
       "      <td>http://www.cvlibs.net/datasets/kitti/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Audi Autonomous Driving Dataset</td>\n",
       "      <td>We have published the Audi Autonomous Driving ...</td>\n",
       "      <td>https://www.a2d2.audi/a2d2/en.html</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ApolloScape Dataset</td>\n",
       "      <td>Trajectory dataset, 3D Perception Lidar Object...</td>\n",
       "      <td>http://apolloscape.auto/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Velodyne SLAM</td>\n",
       "      <td>Here, you can find two challenging datasets re...</td>\n",
       "      <td>http://www.mrt.kit.edu/z/publ/download/velodyn...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Daimler Urban Segmentation Dataset</td>\n",
       "      <td>The Daimler Urban Segmentation Dataset consist...</td>\n",
       "      <td>http://www.6d-vision.com/scene-labeling</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>nuScenes dataset</td>\n",
       "      <td>The nuScenes dataset is a public large-scale d...</td>\n",
       "      <td>https://www.nuscenes.org/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 name  \\\n",
       "0        KITTI Vision Benchmark Suite   \n",
       "1     Audi Autonomous Driving Dataset   \n",
       "2                 ApolloScape Dataset   \n",
       "3                       Velodyne SLAM   \n",
       "4  Daimler Urban Segmentation Dataset   \n",
       "5                    nuScenes dataset   \n",
       "\n",
       "                                         description  \\\n",
       "0  We take advantage of our autonomous driving pl...   \n",
       "1  We have published the Audi Autonomous Driving ...   \n",
       "2  Trajectory dataset, 3D Perception Lidar Object...   \n",
       "3  Here, you can find two challenging datasets re...   \n",
       "4  The Daimler Urban Segmentation Dataset consist...   \n",
       "5  The nuScenes dataset is a public large-scale d...   \n",
       "\n",
       "                                             website  \n",
       "0              http://www.cvlibs.net/datasets/kitti/  \n",
       "1                 https://www.a2d2.audi/a2d2/en.html  \n",
       "2                           http://apolloscape.auto/  \n",
       "3  http://www.mrt.kit.edu/z/publ/download/velodyn...  \n",
       "4            http://www.6d-vision.com/scene-labeling  \n",
       "5                          https://www.nuscenes.org/  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Le fichier datasets.csv est dans le dossier module2/exo4/datasets.csv\n",
    "# Structure de l'entête/données : name;description;website;tags (avec des points virgules pour les champs)\n",
    "\n",
    "# https://pandas.pydata.org\n",
    "# Version 0.22.0 (December 29, 2017) sur ce Jupyter !\n",
    "import pandas as pd\n",
    "\n",
    "# print(pd.__version__) \n",
    "# pd.show_versions() # Toutes les extensions installées\n",
    "\n",
    "# Afficher les colonnes principales\n",
    "datasets = pd.read_csv('datasets.csv', delimiter = ';', usecols=[0,1,2])\n",
    "datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Extraire des mots-clés, des étiquettes décrivant ces datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tags</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>stereo,flow,odometry,tracking,detection,road,m...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>semantic,cloud,segmentation,detection,road,map...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>stereo,flow,semantic,cloud,segmentation,detect...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>detection,images,city</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>stereo,labelling,detection,road,maps,city</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>labelling,detection,road,maps,city</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                tags\n",
       "0  stereo,flow,odometry,tracking,detection,road,m...\n",
       "1  semantic,cloud,segmentation,detection,road,map...\n",
       "2  stereo,flow,semantic,cloud,segmentation,detect...\n",
       "3                              detection,images,city\n",
       "4          stereo,labelling,detection,road,maps,city\n",
       "5                 labelling,detection,road,maps,city"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Vérifier les mots-clés\n",
    "tags = pd.read_csv('datasets.csv', delimiter = ';', usecols=[3])\n",
    "tags"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Créer quelques statistiques de base de ces datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tags</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>stereo,labelling,detection,road,maps,city</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             tags\n",
       "count                                           6\n",
       "unique                                          6\n",
       "top     stereo,labelling,detection,road,maps,city\n",
       "freq                                            1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Vérifier les mots-clés\n",
    "tags = pd.read_csv('datasets.csv', delimiter = ';', usecols=[3])\n",
    "tags.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['stereo,flow,odometry,tracking,detection,road,maps,city']\n",
      "['semantic,cloud,segmentation,detection,road,maps,city']\n",
      "['stereo,flow,semantic,cloud,segmentation,detection,road,maps,city']\n",
      "['detection,images,city']\n",
      "['stereo,labelling,detection,road,maps,city']\n",
      "['labelling,detection,road,maps,city']\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "# import re\n",
    "\n",
    "# Extraire les mots-clés\n",
    "tags = pd.read_csv('datasets.csv', delimiter = ';', usecols=[3])\n",
    "\n",
    "for t in tags.values:\n",
    "    print(t)\n",
    "    \n",
    "    \n",
    "#     array = re.split('\\,',t)\n",
    "#     str = np.array_split(t,1)\n",
    "#     print(str)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}