{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sujet 1 : Concentration de CO2 dans l'atmosphère depuis 1958" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Charles David Keeling a lancé une campagne de mesure de la concentration de C02 dans l'atmosphère. Il a installé ces instrument à l'observatoire de Mauna Loa, Hawaii, Etats-Unis. Depuis 1958, nous avons continuellement des données.\n", "\n", "L'étude initiale devait étudier les variations saisonnière de la concentration, mais avec le réchauffement climatique, elle se tourne maintenant sur la croissance de la concentration.\n", "\n", "A partir des données hebdomadaires disponible sur le [site Web de l'institut Scripps](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html), nous souhaitons reproduire l'analyse de l'évolution de la concentration de C02 dans l'atmosphère pour faire un modèle prédictif." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environnement de travail" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous définions quelques fonctions pour faciliter l'affichage des numéros de version associés à notre système et à nos modules. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hidePrompt": false }, "outputs": [], "source": [ "def print_imported_modules():\n", " import sys\n", " print(\"Imported modules\")\n", " for name, val in sorted(sys.modules.items()):\n", " if(hasattr(val, '__version__')): \n", " print(\"\\t\",val.__name__, val.__version__)\n", " \n", "def print_sys_info():\n", " import sys\n", " import platform\n", " print(\"System Info\")\n", " print(\"\\t\",sys.version)\n", " print(\"\\t\",platform.uname())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous utilisons les modules usuels en traitement des données sous le langage python3 à la date du *6 Avril 2020* : numpy, pandas, seaborn, matplotlib, statsmodels, ... " ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import statsmodels.api as sm\n", "import seaborn as sns\n", "import isoweek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ci-aprés un aperçu de notre environnement d'execution pour les personnes qui souhaiteraient reproduire ces travaux sur leur machine." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "System Info\n", "\t 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57) \n", "[GCC 7.2.0]\n", "\t uname_result(system='Linux', node='c17990f19315', release='4.4.0-164-generic', version='#192-Ubuntu SMP Fri Sep 13 12:02:50 UTC 2019', machine='x86_64', processor='x86_64')\n", "Imported modules\n", "\t IPython 7.12.0\n", "\t IPython.core.release 7.12.0\n", "\t PIL 7.0.0\n", "\t PIL.Image 7.0.0\n", "\t PIL._version 7.0.0\n", "\t _csv 1.0\n", "\t _ctypes 1.1.0\n", "\t _curses b'2.2'\n", "\t decimal 1.70\n", "\t argparse 1.1\n", "\t backcall 0.1.0\n", "\t cffi 1.13.2\n", "\t csv 1.0\n", "\t ctypes 1.1.0\n", "\t cycler 0.10.0\n", "\t dateutil 2.8.1\n", "\t decimal 1.70\n", "\t decorator 4.4.1\n", "\t distutils 3.6.4\n", "\t ipaddress 1.0\n", "\t ipykernel 5.1.4\n", "\t ipykernel._version 5.1.4\n", "\t ipython_genutils 0.2.0\n", "\t ipython_genutils._version 0.2.0\n", "\t ipywidgets 7.2.1\n", "\t ipywidgets._version 7.2.1\n", "\t jedi 0.16.0\n", "\t json 2.0.9\n", "\t jupyter_client 6.0.0\n", "\t jupyter_client._version 6.0.0\n", "\t jupyter_core 4.6.3\n", "\t jupyter_core.version 4.6.3\n", "\t kiwisolver 1.1.0\n", "\t logging 0.5.1.2\n", "\t matplotlib 2.2.3\n", "\t matplotlib.backends.backend_agg 2.2.3\n", "\t numpy 1.15.2\n", "\t numpy.core 1.15.2\n", "\t numpy.core.multiarray 3.1\n", "\t numpy.lib 1.15.2\n", "\t numpy.linalg._umath_linalg b'0.1.5'\n", "\t numpy.matlib 1.15.2\n", "\t optparse 1.5.3\n", "\t pandas 0.22.0\n", "\t _libjson 1.33\n", "\t parso 0.6.0\n", "\t patsy 0.5.1\n", "\t patsy.version 0.5.1\n", "\t pexpect 4.8.0\n", "\t pickleshare 0.7.5\n", "\t platform 1.0.8\n", "\t prompt_toolkit 3.0.3\n", "\t ptyprocess 0.6.0\n", "\t pygments 2.5.2\n", "\t pyparsing 2.4.6\n", "\t pytz 2019.3\n", "\t re 2.2.1\n", "\t scipy 1.1.0\n", "\t scipy._lib.decorator 4.0.5\n", "\t scipy._lib.six 1.2.0\n", "\t scipy.fftpack._fftpack b'$Revision: $'\n", "\t scipy.fftpack.convolve b'$Revision: $'\n", "\t scipy.integrate._dop b'$Revision: $'\n", "\t scipy.integrate._ode $Id$\n", "\t scipy.integrate._odepack 1.9 \n", "\t scipy.integrate._quadpack 1.13 \n", "\t scipy.integrate.lsoda b'$Revision: $'\n", "\t scipy.integrate.vode b'$Revision: $'\n", "\t scipy.interpolate._fitpack 1.7 \n", "\t scipy.interpolate.dfitpack b'$Revision: $'\n", "\t scipy.linalg 0.4.9\n", "\t scipy.linalg._fblas b'$Revision: $'\n", "\t scipy.linalg._flapack b'$Revision: $'\n", "\t scipy.linalg._flinalg b'$Revision: $'\n", "\t scipy.ndimage 2.0\n", "\t scipy.optimize._cobyla b'$Revision: $'\n", "\t scipy.optimize._lbfgsb b'$Revision: $'\n", "\t scipy.optimize._minpack 1.10 \n", "\t scipy.optimize._nnls b'$Revision: $'\n", "\t scipy.optimize._slsqp b'$Revision: $'\n", "\t scipy.optimize.minpack2 b'$Revision: $'\n", "\t scipy.signal.spline 0.2\n", "\t scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $'\n", "\t scipy.sparse.linalg.isolve._iterative b'$Revision: $'\n", "\t scipy.special.specfun b'$Revision: $'\n", "\t scipy.stats.mvn b'$Revision: $'\n", "\t scipy.stats.statlib b'$Revision: $'\n", "\t seaborn 0.8.1\n", "\t seaborn.external.husl 2.1.0\n", "\t seaborn.external.six 1.10.0\n", "\t six 1.14.0\n", "\t statsmodels 0.9.0\n", "\t statsmodels.__init__ 0.9.0\n", "\t traitlets 4.3.3\n", "\t traitlets._version 4.3.3\n", "\t urllib.request 3.6\n", "\t zlib 1.0\n", "\t zmq 17.1.2\n", "\t zmq.sugar 17.1.2\n", "\t zmq.sugar.version 17.1.2\n" ] } ], "source": [ "print_sys_info()\n", "print_imported_modules()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chargement et inspection des données" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous avons récupérer les données hebdomadaire le *6 Avril 2020* depuis le lien suivant : [https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/weekly/weekly_in_situ_co2_mlo.csv](https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/weekly/weekly_in_situ_co2_mlo.csv)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "filename = \"./weekly_in_situ_co2_mlo.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous affichons les premières lignes du fichier pour repérer d'éventuelle lignes à ignorer." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ligne 0 : \"-------------------------------------------------------------------------------------------\"\n", "Ligne 1 : \" Atmospheric CO2 concentrations (ppm) derived from in situ air measurements \"\n", "Ligne 2 : \" at Mauna Loa, Observatory, Hawaii: Latitude 19.5°N Longitude 155.6°W Elevation 3397m \"\n", "Ligne 3 : \" \"\n", "Ligne 4 : \" Source: R. F. Keeling, S. J. Walker, S. C. Piper and A. F. Bollenbacher \"\n" ] } ], "source": [ "def head(filename,n):\n", " with open(filename,\"r\") as f:\n", " lignes = f.readlines()\n", " n = min(n,len(lignes))\n", " for i,ligne in enumerate(lignes[:n]):\n", " print(\"Ligne\",i,\":\",ligne,end=\"\")\n", "head(filename,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le fichier semble être correctement formaté :\n", "* Les lignes de commentaire/metadonnée commencent par \"\n", "* Les données ne commencent pas par \"\n", "\n", "Trouvons donc la première ligne de données." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "44" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def find_num_first_dataline(filename):\n", " with open(filename,\"r\") as f:\n", " lignes = f.readlines()\n", " for i,ligne in enumerate(lignes):\n", " if ligne[0] != '\"':\n", " return i\n", " raise Exception(\"No dataline found\")\n", "find_num_first_dataline(filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aprés une inspection visuelle, nous avons aussi trouvé que les données commence ligne 44.\n", "Lors de cette inspection, nous avons pu relever les informations suivantes :\n", "1. La première colonne correspond aux dates d'acquisition\n", "2. Les données sont centrées sur 12h00 chaque jour\n", "3. La seconde colonne correspond aux concentrations mesurées\n", "4. La concentration est la concentration moyenne de C02 dans l'atomosphère de la journée" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Date | \n", "Concentration | \n", "
---|---|---|
0 | \n", "1958-03-29 | \n", "316.19 | \n", "
1 | \n", "1958-04-05 | \n", "317.31 | \n", "
2 | \n", "1958-04-12 | \n", "317.69 | \n", "
3 | \n", "1958-04-19 | \n", "317.58 | \n", "
4 | \n", "1958-04-26 | \n", "316.48 | \n", "
5 | \n", "1958-05-03 | \n", "316.95 | \n", "
6 | \n", "1958-05-17 | \n", "317.56 | \n", "
7 | \n", "1958-05-24 | \n", "317.99 | \n", "
8 | \n", "1958-07-05 | \n", "315.85 | \n", "
9 | \n", "1958-07-12 | \n", "315.85 | \n", "
10 | \n", "1958-07-19 | \n", "315.46 | \n", "
11 | \n", "1958-07-26 | \n", "315.59 | \n", "
12 | \n", "1958-08-02 | \n", "315.64 | \n", "
13 | \n", "1958-08-09 | \n", "315.10 | \n", "
14 | \n", "1958-08-16 | \n", "315.09 | \n", "
15 | \n", "1958-08-30 | \n", "314.14 | \n", "
16 | \n", "1958-09-06 | \n", "313.54 | \n", "
17 | \n", "1958-11-08 | \n", "313.05 | \n", "
18 | \n", "1958-11-15 | \n", "313.26 | \n", "
19 | \n", "1958-11-22 | \n", "313.57 | \n", "
20 | \n", "1958-11-29 | \n", "314.01 | \n", "
21 | \n", "1958-12-06 | \n", "314.56 | \n", "
22 | \n", "1958-12-13 | \n", "314.41 | \n", "
23 | \n", "1958-12-20 | \n", "314.77 | \n", "
24 | \n", "1958-12-27 | \n", "315.21 | \n", "
25 | \n", "1959-01-03 | \n", "315.24 | \n", "
26 | \n", "1959-01-10 | \n", "315.50 | \n", "
27 | \n", "1959-01-17 | \n", "315.69 | \n", "
28 | \n", "1959-01-24 | \n", "315.86 | \n", "
29 | \n", "1959-01-31 | \n", "315.42 | \n", "
... | \n", "... | \n", "... | \n", "
3126 | \n", "2019-07-06 | \n", "412.69 | \n", "
3127 | \n", "2019-07-13 | \n", "412.30 | \n", "
3128 | \n", "2019-07-20 | \n", "411.76 | \n", "
3129 | \n", "2019-07-27 | \n", "410.32 | \n", "
3130 | \n", "2019-08-03 | \n", "410.50 | \n", "
3131 | \n", "2019-08-10 | \n", "410.48 | \n", "
3132 | \n", "2019-08-17 | \n", "410.05 | \n", "
3133 | \n", "2019-08-24 | \n", "409.52 | \n", "
3134 | \n", "2019-08-31 | \n", "409.32 | \n", "
3135 | \n", "2019-09-07 | \n", "408.80 | \n", "
3136 | \n", "2019-09-14 | \n", "408.61 | \n", "
3137 | \n", "2019-09-21 | \n", "408.50 | \n", "
3138 | \n", "2019-09-28 | \n", "408.28 | \n", "
3139 | \n", "2019-10-05 | \n", "407.99 | \n", "
3140 | \n", "2019-10-12 | \n", "408.61 | \n", "
3141 | \n", "2019-10-19 | \n", "408.77 | \n", "
3142 | \n", "2019-10-26 | \n", "408.68 | \n", "
3143 | \n", "2019-11-02 | \n", "409.86 | \n", "
3144 | \n", "2019-11-09 | \n", "410.15 | \n", "
3145 | \n", "2019-11-16 | \n", "410.22 | \n", "
3146 | \n", "2019-11-23 | \n", "410.48 | \n", "
3147 | \n", "2019-11-30 | \n", "410.92 | \n", "
3148 | \n", "2019-12-07 | \n", "411.27 | \n", "
3149 | \n", "2019-12-14 | \n", "411.67 | \n", "
3150 | \n", "2019-12-21 | \n", "412.30 | \n", "
3151 | \n", "2019-12-28 | \n", "412.59 | \n", "
3152 | \n", "2020-01-04 | \n", "413.19 | \n", "
3153 | \n", "2020-01-11 | \n", "413.39 | \n", "
3154 | \n", "2020-01-25 | \n", "413.36 | \n", "
3155 | \n", "2020-02-01 | \n", "413.99 | \n", "
3156 rows × 2 columns
\n", "