{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyse des mots-clés de mon journal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mise en forme de mon journal\n", "Mes paragraphes sont titrés \"Date:JJ/MM/AAAA\".\n", "\n", "La première ligne du paragraphe contient [Mots Clés:Xxxx,Yyyy,Zzzz] où Xxxx, Yyyy, Zzzz sont des mots-clés." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recupération des données" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "NomFichentree = \"../../journal/Readme.md\"\n", "NomFichsortie = \"./Info_Journal.csv\"\n", "\n", "with open(NomFichsortie, \"w\", encoding='utf-8') as sortie:\n", " with open(NomFichentree, \"r\", encoding='utf-8') as entree:\n", " for ligne in entree:\n", " if \"Date:\" in ligne:\n", " (rien,date) = ligne.lstrip('## ').strip().split(':')\n", " if \"Mots Clés\" in ligne:\n", " (rien,mots) = ligne.strip('Mots Clés').split(':')\n", " listemots = mots.strip().split(',')\n", " for mot in listemots:\n", " sortie.write(f\"{date};{mot}\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyse des données\n", "On peut regarder l'aspect des données." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 1\n", "0 12/07/2020 Mooc\n", "1 12/07/2020 Installation\n", "2 12/07/2020 Python\n", "3 12/07/2020 Jupiter\n", "4 12/07/2020 R \n", "...\n", " 0 1\n", "33 22/08/2020 Réunion\n", "34 22/08/2020 Python\n", "35 28/08/2020 Lecture\n", "36 28/08/2020 Python\n", "37 28/08/2020 GUI\n" ] } ], "source": [ "NomFichentree = \"./Info_Journal.csv\"\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "donnees = pd.read_csv(NomFichentree, sep=';', header=None,\n", " infer_datetime_format=True, keep_date_col=True,\n", " dayfirst=True)\n", "\n", "print(donnees.head(), '\\n...\\n', donnees.tail())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous pouvons ensuite regarder:\n", "- le nombre total de données (le même que le nombre de lignes),\n", "- le nombre de données uniques (pour les dates et pour les mots-clés),\n", "- la date et le mot-clé les plus fréquents, avec leur fréquences respectives." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 1\n", "count 38 38\n", "unique 13 15\n", "top 12/07/2020 Mooc\n", "freq 5 7\n" ] } ], "source": [ "print(donnees.describe())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous pouvons regarder, également les fréquences respectives de chaque date et de chaque mot-clé." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12/07/2020 5\n", "17/07/2020 5\n", "28/08/2020 3\n", "13/08/2020 3\n", "23/07/2020 3\n", "08/08/2020 3\n", "17/08/2020 3\n", "10/08/2020 3\n", "21/07/2020 2\n", "20/07/2020 2\n", "25/07/2020 2\n", "02/08/2020 2\n", "22/08/2020 2\n", "Name: 0, dtype: int64\n", "Mooc 7\n", "Python 5\n", "Lecture 3\n", "Module 2 3\n", "Réunion 3\n", "GUI 3\n", "Exercices 3\n", "Module 1 2\n", "R 2\n", "Jupiter 2\n", "Git 1\n", "Pannes oscillatoires 1\n", "Génération ACS 1\n", "Configuration 1\n", "Installation 1\n", "Name: 1, dtype: int64\n" ] } ], "source": [ "print(pd.value_counts(donnees[0]))\n", "print(pd.value_counts(donnees[1]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }