From 4aeb65d07eeea12b058ce0527c1f859eb61d8e5e Mon Sep 17 00:00:00 2001 From: 8645139112c21501d9914ade8e4a034f <8645139112c21501d9914ade8e4a034f@app-learninglab.inria.fr> Date: Fri, 10 Apr 2020 07:52:30 +0000 Subject: [PATCH] First --- module3/exo3/exercice.ipynb | 595 +++++++++++++++++++++++++++++++++++- 1 file changed, 592 insertions(+), 3 deletions(-) diff --git a/module3/exo3/exercice.ipynb b/module3/exo3/exercice.ipynb index 0bbbe37..5b2bd71 100644 --- a/module3/exo3/exercice.ipynb +++ b/module3/exo3/exercice.ipynb @@ -1,6 +1,596 @@ { - "cells": [], + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "source": [ + "# Travail pratique avec évaluation par les pairs\n", + "## Sujet 1 : Concentration de CO2 dans l'atmosphère depuis 1958\n", + "### Auteur: William Dethier (william.dethier@univ-grenoble-alpes.fr)\n", + "\n", + "## Consignes:\n", + "En 1958, Charles David Keeling a initié une mesure de la concentration de $CO_2$ dans l'atmosphère à l'observatoire de Mauna Loa, Hawaii, États-Unis qui continue jusqu'à aujourd'hui. L'objectif initial était d'étudier la variation saisonnière, mais l'intérêt s'est déplacé plus tard vers l'étude de la tendance croissante dans le contexte du changement climatique. En honneur à Keeling, ce jeu de données est souvent appelé \"Keeling Curve\" (voir https://en.wikipedia.org/wiki/Keeling_Curve pour l'histoire et l'importance de ces données).\n", + "\n", + "Les données sont disponibles sur le [site Web de l'institut Scripps](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html). Utilisez le fichier avec les observations hebdomadaires. Attention, ce fichier est mis à jour régulièrement avec de nouvelles observations. Notez donc bien la date du téléchargement, et gardez une copie locale de la version précise que vous analysez. Faites aussi attention aux données manquantes.\n", + "\n", + "Votre mission si vous l'acceptez :\n", + "1. Réalisez un graphique qui vous montrera une oscillation périodique superposée à une évolution systématique plus lente.\n", + "2. Séparez ces deux phénomènes. Caractérisez l'oscillation périodique. Proposez un modèle simple de la contribution lente, estimez ses paramètres et tentez une extrapolation jusqu'à 2025 (dans le but de pouvoir valider le modèle par des observations futures).\n", + "3. Déposer dans FUN votre résultat\n", + "\n", + "## Téléchargement des données:\n", + "\n", + "Nous nous rendons sur le site de l'**institut Scripps** avec l'url donné: https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html.\n", + "\n", + "Sur ce site nous choisissons les données correspondant à celle récoltées depuis 1958 jusqu'aujourd'hui qui sont des données hebdomadaires. Le fichier obtenu à le nom suivant: *weekly_in_situ_co2_mlo.csv*. Les données ont été téléchargées le 10 avril 2020 à 08:38. \n", + "\n", + "La description des données dans le fichier, indique que le fichier contient deux colonnes indiquant la date et la concentration de $CO_2$ en micro-mol de $CO_2$ par mole (ppm: partie par million (mg/kg); [voir la page *Wikipedia* ](https://www.google.be/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=2ahUKEwjCn-T8qN3oAhXKwKQKHW0XAfMQFjACegQICxAF&url=https%3A%2F%2Ffr.wikipedia.org%2Fwiki%2FPartie_par_million&usg=AOvVaw17FszDa5Y_l-nQSsHYMHmC)pour une explication détaillée ).\n", + "\n", + "## Pré-traitement des données:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import isoweek" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "source": [ + "Après inspection visuelle, les premières lignes du fichier CSV sont un commentaire, que nous ignorons en précisant **skiprows=43**.\n", + "\n", + "**Attention: nous avons modifié le fichier source en ajoutant simplement le nom des colonnes afin de ne pas avoir une partie du commentaire dans l'affichage et afin que ce soit plus clair. Cela ne change rien aux données. Nous avons écrit une ligne entre la fin du commentaire et le début des données comme suit: Date, Concentration .\n", + "Nous utilisons donc un fichier nommé *weekly_in_situ_co2_mlomodified.csv* comprenant la modification, mais afin d'avoir les données originales, le fichier source *weekly_in_situ_co2_mlo.csv* est tout de même gardé dans le répertoire sur GitLab.**\n", + "\n", + "Ensuite, nous affichons les données brutes." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateConcentration
01958-03-29316.19
11958-04-05317.31
21958-04-12317.69
31958-04-19317.58
41958-04-26316.48
51958-05-03316.95
61958-05-17317.56
71958-05-24317.99
81958-07-05315.85
91958-07-12315.85
101958-07-19315.46
111958-07-26315.59
121958-08-02315.64
131958-08-09315.10
141958-08-16315.09
151958-08-30314.14
161958-09-06313.54
171958-11-08313.05
181958-11-15313.26
191958-11-22313.57
201958-11-29314.01
211958-12-06314.56
221958-12-13314.41
231958-12-20314.77
241958-12-27315.21
251959-01-03315.24
261959-01-10315.50
271959-01-17315.69
281959-01-24315.86
291959-01-31315.42
.........
31262019-07-06412.69
31272019-07-13412.30
31282019-07-20411.76
31292019-07-27410.32
31302019-08-03410.50
31312019-08-10410.48
31322019-08-17410.05
31332019-08-24409.52
31342019-08-31409.32
31352019-09-07408.80
31362019-09-14408.61
31372019-09-21408.50
31382019-09-28408.28
31392019-10-05407.99
31402019-10-12408.61
31412019-10-19408.77
31422019-10-26408.68
31432019-11-02409.86
31442019-11-09410.15
31452019-11-16410.22
31462019-11-23410.48
31472019-11-30410.92
31482019-12-07411.27
31492019-12-14411.67
31502019-12-21412.30
31512019-12-28412.59
31522020-01-04413.19
31532020-01-11413.39
31542020-01-25413.36
31552020-02-01413.99
\n", + "

3156 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Date Concentration\n", + "0 1958-03-29 316.19\n", + "1 1958-04-05 317.31\n", + "2 1958-04-12 317.69\n", + "3 1958-04-19 317.58\n", + "4 1958-04-26 316.48\n", + "5 1958-05-03 316.95\n", + "6 1958-05-17 317.56\n", + "7 1958-05-24 317.99\n", + "8 1958-07-05 315.85\n", + "9 1958-07-12 315.85\n", + "10 1958-07-19 315.46\n", + "11 1958-07-26 315.59\n", + "12 1958-08-02 315.64\n", + "13 1958-08-09 315.10\n", + "14 1958-08-16 315.09\n", + "15 1958-08-30 314.14\n", + "16 1958-09-06 313.54\n", + "17 1958-11-08 313.05\n", + "18 1958-11-15 313.26\n", + "19 1958-11-22 313.57\n", + "20 1958-11-29 314.01\n", + "21 1958-12-06 314.56\n", + "22 1958-12-13 314.41\n", + "23 1958-12-20 314.77\n", + "24 1958-12-27 315.21\n", + "25 1959-01-03 315.24\n", + "26 1959-01-10 315.50\n", + "27 1959-01-17 315.69\n", + "28 1959-01-24 315.86\n", + "29 1959-01-31 315.42\n", + "... ... ...\n", + "3126 2019-07-06 412.69\n", + "3127 2019-07-13 412.30\n", + "3128 2019-07-20 411.76\n", + "3129 2019-07-27 410.32\n", + "3130 2019-08-03 410.50\n", + "3131 2019-08-10 410.48\n", + "3132 2019-08-17 410.05\n", + "3133 2019-08-24 409.52\n", + "3134 2019-08-31 409.32\n", + "3135 2019-09-07 408.80\n", + "3136 2019-09-14 408.61\n", + "3137 2019-09-21 408.50\n", + "3138 2019-09-28 408.28\n", + "3139 2019-10-05 407.99\n", + "3140 2019-10-12 408.61\n", + "3141 2019-10-19 408.77\n", + "3142 2019-10-26 408.68\n", + "3143 2019-11-02 409.86\n", + "3144 2019-11-09 410.15\n", + "3145 2019-11-16 410.22\n", + "3146 2019-11-23 410.48\n", + "3147 2019-11-30 410.92\n", + "3148 2019-12-07 411.27\n", + "3149 2019-12-14 411.67\n", + "3150 2019-12-21 412.30\n", + "3151 2019-12-28 412.59\n", + "3152 2020-01-04 413.19\n", + "3153 2020-01-11 413.39\n", + "3154 2020-01-25 413.36\n", + "3155 2020-02-01 413.99\n", + "\n", + "[3156 rows x 2 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_from_site=open(\"weekly_in_situ_co2_mlomodified.csv\")\n", + "#data_from_site=open(\"weekly_in_situ_co2_mlo.csv\")\n", + "\n", + "raw_data = pd.read_csv(data_from_site, skiprows=44)\n", + "raw_data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "source": [ + "Y a-t-il des points manquants dans ce jeux de données ?" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateConcentration
\n", + "
" + ], + "text/plain": [ + "Empty DataFrame\n", + "Columns: [Date, Concentration]\n", + "Index: []" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "raw_data[raw_data.isnull().any(axis=1)]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "source": [ + "Pas de données manquante, on continue l'analyse." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "hideCode": true, + "hidePrompt": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "hideCode": true, + "hideOutput": true, + "hidePrompt": true + }, + "outputs": [], + "source": [] + } + ], "metadata": { + "hide_code_all_hidden": true, "kernelspec": { "display_name": "Python 3", "language": "python", @@ -16,10 +606,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.3" + "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 } - -- 2.18.1