{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 3 : Concentration de CO$_{\\textbf{2}}$ dans l'atmosphère depuis 1958" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import os\n", "from urllib.request import urlretrieve\n", "import datetime\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importation et formatage des données" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données sont disponibles sur le [site Web de l'institut Scripps](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html). Nous les récupérons sous forme d'un fichier en format CSV dont chaque ligne correspond à une semaine de la période demandée (29/03/1958 à aujourd'hui 09/12/2024). Nous téléchargeons à ce jour dans le dossier local à [cette URL](https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/weekly/weekly_in_situ_co2_mlo.csv) à l'aide de la bibliothèque `urllib.request`. Si le fichier est déjà téléchargé, nous l'importons depuis le dossier local." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File data_keeling.csv found at /home/jovyan/work/module3/exo3/data_keeling.csv\n" ] } ], "source": [ "data_url = \"https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/weekly/weekly_in_situ_co2_mlo.csv\"\n", "data_file = \"data_keeling.csv\"\n", "\n", "if not os.path.exists(data_file):\n", " urlretrieve(data_url, data_file)\n", " print(f\"File downloaded and saved as {data_file}\")\n", "else:\n", " print(f\"File {data_file} found at {os.path.abspath(data_file)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le fichier contient 44 lignes de commentaires (ignorées en précisant `skiprows=44`) expliquant le fichier et les méthodes de mesure. Il contient ensuite deux colonnes:\n", "- Date (premier jour de la période d'une semaine)\n", "- Concentration en CO$_2$ (ppm)\n", "\n", "Aucune ligne ne définit le nom des colonnes. Il faut donc préciser `header=None`, puis préciser le nom des colonnes par la suite." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateConcentration en CO2 (ppm)
01958-03-29316.19
11958-04-05317.31
21958-04-12317.69
31958-04-19317.58
41958-04-26316.48
51958-05-03316.95
61958-05-17317.56
71958-05-24317.99
81958-07-05315.85
91958-07-12315.85
101958-07-19315.46
111958-07-26315.59
121958-08-02315.64
131958-08-09315.10
141958-08-16315.09
151958-08-30314.14
161958-09-06313.54
171958-11-08313.05
181958-11-15313.26
191958-11-22313.57
201958-11-29314.01
211958-12-06314.56
221958-12-13314.41
231958-12-20314.77
241958-12-27315.21
251959-01-03315.24
261959-01-10315.50
271959-01-17315.69
281959-01-24315.86
291959-01-31315.42
.........
33732024-04-20426.91
33742024-04-27427.13
33752024-05-04426.51
33762024-05-11427.20
33772024-05-18426.26
33782024-05-25426.68
33792024-06-01426.78
33802024-06-08427.01
33812024-06-15427.10
33822024-06-22426.54
33832024-06-29425.41
33842024-07-06425.73
33852024-07-13426.10
33862024-07-20424.36
33872024-07-27424.72
33882024-08-03424.42
33892024-08-10422.50
33902024-08-17422.80
33912024-08-24421.45
33922024-08-31421.57
33932024-09-07421.81
33942024-09-14421.39
33952024-09-21421.77
33962024-09-28421.51
33972024-10-05421.86
33982024-10-12422.13
33992024-10-19422.16
34002024-10-26422.36
34012024-11-02423.15
34022024-11-09423.18
\n", "

3403 rows × 2 columns

\n", "
" ], "text/plain": [ " Date Concentration en CO2 (ppm)\n", "0 1958-03-29 316.19\n", "1 1958-04-05 317.31\n", "2 1958-04-12 317.69\n", "3 1958-04-19 317.58\n", "4 1958-04-26 316.48\n", "5 1958-05-03 316.95\n", "6 1958-05-17 317.56\n", "7 1958-05-24 317.99\n", "8 1958-07-05 315.85\n", "9 1958-07-12 315.85\n", "10 1958-07-19 315.46\n", "11 1958-07-26 315.59\n", "12 1958-08-02 315.64\n", "13 1958-08-09 315.10\n", "14 1958-08-16 315.09\n", "15 1958-08-30 314.14\n", "16 1958-09-06 313.54\n", "17 1958-11-08 313.05\n", "18 1958-11-15 313.26\n", "19 1958-11-22 313.57\n", "20 1958-11-29 314.01\n", "21 1958-12-06 314.56\n", "22 1958-12-13 314.41\n", "23 1958-12-20 314.77\n", "24 1958-12-27 315.21\n", "25 1959-01-03 315.24\n", "26 1959-01-10 315.50\n", "27 1959-01-17 315.69\n", "28 1959-01-24 315.86\n", "29 1959-01-31 315.42\n", "... ... ...\n", "3373 2024-04-20 426.91\n", "3374 2024-04-27 427.13\n", "3375 2024-05-04 426.51\n", "3376 2024-05-11 427.20\n", "3377 2024-05-18 426.26\n", "3378 2024-05-25 426.68\n", "3379 2024-06-01 426.78\n", "3380 2024-06-08 427.01\n", "3381 2024-06-15 427.10\n", "3382 2024-06-22 426.54\n", "3383 2024-06-29 425.41\n", "3384 2024-07-06 425.73\n", "3385 2024-07-13 426.10\n", "3386 2024-07-20 424.36\n", "3387 2024-07-27 424.72\n", "3388 2024-08-03 424.42\n", "3389 2024-08-10 422.50\n", "3390 2024-08-17 422.80\n", "3391 2024-08-24 421.45\n", "3392 2024-08-31 421.57\n", "3393 2024-09-07 421.81\n", "3394 2024-09-14 421.39\n", "3395 2024-09-21 421.77\n", "3396 2024-09-28 421.51\n", "3397 2024-10-05 421.86\n", "3398 2024-10-12 422.13\n", "3399 2024-10-19 422.16\n", "3400 2024-10-26 422.36\n", "3401 2024-11-02 423.15\n", "3402 2024-11-09 423.18\n", "\n", "[3403 rows x 2 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data = pd.read_csv(data_file, skiprows=44, header=None)\n", "raw_data.columns = [\"Date\", \"Concentration en CO2 (ppm)\"]\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vérifions si le jeu de données contient des lignes vides." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateConcentration en CO2 (ppm)
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Date, Concentration en CO2 (ppm)]\n", "Index: []" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[raw_data.isnull().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aucune ligne n'est vide. Traduisons maintenant la colonne \"Date\" en format date utilisé par pandas, puis passons-là en indice du tableau." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date\n", "1958-03-29 316.19\n", "1958-04-05 317.31\n", "1958-04-12 317.69\n", "1958-04-19 317.58\n", "1958-04-26 316.48\n", "1958-05-03 316.95\n", "1958-05-17 317.56\n", "1958-05-24 317.99\n", "1958-07-05 315.85\n", "1958-07-12 315.85\n", "1958-07-19 315.46\n", "1958-07-26 315.59\n", "1958-08-02 315.64\n", "1958-08-09 315.10\n", "1958-08-16 315.09\n", "1958-08-30 314.14\n", "1958-09-06 313.54\n", "1958-11-08 313.05\n", "1958-11-15 313.26\n", "1958-11-22 313.57\n", "1958-11-29 314.01\n", "1958-12-06 314.56\n", "1958-12-13 314.41\n", "1958-12-20 314.77\n", "1958-12-27 315.21\n", "1959-01-03 315.24\n", "1959-01-10 315.50\n", "1959-01-17 315.69\n", "1959-01-24 315.86\n", "1959-01-31 315.42\n", " ... \n", "2024-04-20 426.91\n", "2024-04-27 427.13\n", "2024-05-04 426.51\n", "2024-05-11 427.20\n", "2024-05-18 426.26\n", "2024-05-25 426.68\n", "2024-06-01 426.78\n", "2024-06-08 427.01\n", "2024-06-15 427.10\n", "2024-06-22 426.54\n", "2024-06-29 425.41\n", "2024-07-06 425.73\n", "2024-07-13 426.10\n", "2024-07-20 424.36\n", "2024-07-27 424.72\n", "2024-08-03 424.42\n", "2024-08-10 422.50\n", "2024-08-17 422.80\n", "2024-08-24 421.45\n", "2024-08-31 421.57\n", "2024-09-07 421.81\n", "2024-09-14 421.39\n", "2024-09-21 421.77\n", "2024-09-28 421.51\n", "2024-10-05 421.86\n", "2024-10-12 422.13\n", "2024-10-19 422.16\n", "2024-10-26 422.36\n", "2024-11-02 423.15\n", "2024-11-09 423.18\n", "Length: 3403, dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data[\"Date\"] = pd.to_datetime(raw_data[\"Date\"])\n", "data = pd.Series(data = raw_data[\"Concentration en CO2 (ppm)\"].tolist(), index = raw_data[\"Date\"])\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vérifions si des données sont manquantes, _i.e._ si deux dates ont plus d'une semaine d'écart." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1958-05-03 00:00:00 1958-05-17 00:00:00\n", "1958-05-24 00:00:00 1958-07-05 00:00:00\n", "1958-08-16 00:00:00 1958-08-30 00:00:00\n", "1958-09-06 00:00:00 1958-11-08 00:00:00\n", "1959-01-31 00:00:00 1959-02-14 00:00:00\n", "1959-03-07 00:00:00 1959-03-21 00:00:00\n", "1959-05-23 00:00:00 1959-06-06 00:00:00\n", "1959-08-08 00:00:00 1959-08-22 00:00:00\n", "1962-08-18 00:00:00 1962-09-15 00:00:00\n", "1962-12-22 00:00:00 1963-01-05 00:00:00\n", "1963-02-09 00:00:00 1963-02-23 00:00:00\n", "1963-04-27 00:00:00 1963-05-11 00:00:00\n", "1963-11-16 00:00:00 1963-11-30 00:00:00\n", "1964-01-18 00:00:00 1964-05-30 00:00:00\n", "1964-06-06 00:00:00 1964-06-27 00:00:00\n", "1964-08-01 00:00:00 1964-08-15 00:00:00\n", "1966-07-09 00:00:00 1966-08-06 00:00:00\n", "1966-10-29 00:00:00 1966-11-12 00:00:00\n", "1967-01-14 00:00:00 1967-02-04 00:00:00\n", "1976-06-19 00:00:00 1976-07-03 00:00:00\n", "1984-03-24 00:00:00 1984-04-28 00:00:00\n", "1985-07-27 00:00:00 1985-08-10 00:00:00\n", "2003-06-07 00:00:00 2003-06-21 00:00:00\n", "2003-10-04 00:00:00 2003-10-25 00:00:00\n", "2005-02-19 00:00:00 2005-03-26 00:00:00\n", "2006-02-04 00:00:00 2006-02-25 00:00:00\n", "2007-01-20 00:00:00 2007-02-03 00:00:00\n", "2012-09-29 00:00:00 2012-10-20 00:00:00\n", "2020-01-11 00:00:00 2020-01-25 00:00:00\n", "2022-11-26 00:00:00 2022-12-17 00:00:00\n" ] } ], "source": [ "dates = data.index\n", "for d1, d2 in zip(dates[:-1], dates[1:]):\n", " delta = d2 - d1\n", " if delta - pd.Timedelta(1,'W') > pd.Timedelta('1s'):\n", " print(d1, d2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quelques données sont manquantes, mais leur nombre est largement inférieur au nombre de données, et sont éparses après les années 1960. Nous pouvons donc considérer l'erreur induite par le manque de données négligeable.\n", "\n", "Néanmoins, nous pouvons combler ces données en utilisant des moyennes glissantes pour aider à la visualisation, tout en gardant en mémoire les dates avec des données interpolées, afin de les exclure de l'analyse quantitative. Cela est effectué à l'aide du tableau `interpolated_marks` :" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1958-03-29 0\n", "1958-04-05 0\n", "1958-04-12 0\n", "1958-04-19 0\n", "1958-04-26 0\n", "1958-05-03 0\n", "1958-05-10 1\n", "1958-05-17 0\n", "1958-05-24 0\n", "1958-05-31 1\n", "1958-06-07 1\n", "1958-06-14 1\n", "1958-06-21 1\n", "1958-06-28 1\n", "1958-07-05 0\n", "1958-07-12 0\n", "1958-07-19 0\n", "1958-07-26 0\n", "1958-08-02 0\n", "1958-08-09 0\n", "1958-08-16 0\n", "1958-08-23 1\n", "1958-08-30 0\n", "1958-09-06 0\n", "1958-09-13 1\n", "1958-09-20 1\n", "1958-09-27 1\n", "1958-10-04 1\n", "1958-10-11 1\n", "1958-10-18 1\n", " ..\n", "2024-04-20 0\n", "2024-04-27 0\n", "2024-05-04 0\n", "2024-05-11 0\n", "2024-05-18 0\n", "2024-05-25 0\n", "2024-06-01 0\n", "2024-06-08 0\n", "2024-06-15 0\n", "2024-06-22 0\n", "2024-06-29 0\n", "2024-07-06 0\n", "2024-07-13 0\n", "2024-07-20 0\n", "2024-07-27 0\n", "2024-08-03 0\n", "2024-08-10 0\n", "2024-08-17 0\n", "2024-08-24 0\n", "2024-08-31 0\n", "2024-09-07 0\n", "2024-09-14 0\n", "2024-09-21 0\n", "2024-09-28 0\n", "2024-10-05 0\n", "2024-10-12 0\n", "2024-10-19 0\n", "2024-10-26 0\n", "2024-11-02 0\n", "2024-11-09 0\n", "Freq: 7D, Length: 3477, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "full_index = pd.date_range(start=data.index[0], end=data.index[-1], freq='7D')\n", "full_data = data.reindex(full_index)\n", "while full_data.isna().any():\n", " rolling_mean = full_data.rolling(window=5, min_periods=3).mean()\n", " full_data[full_data.isna()] = rolling_mean[full_data.isna()]\n", "interpolated_marks = pd.Series(data=np.where(data.reindex(full_index).isna(), 1, 0), index=full_index)\n", "interpolated_marks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interprétation des données" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Commençons par représenter l'évolution de la concentration en CO$_2$ depuis 1958." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "full_data.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On observe une oscillation rapide autour d'une évolution plus lente. Nous pouvons donc modéliser l'évolution temporelle comme :\n", "\n", "$$C(t)=f(t)+A\\cos\\Big(\\frac{2\\pi}{T}(t-t_0)\\Big)\\ ,$$\n", "\n", "où $C(t)$ est la concentration en CO$_2$, $t$ est le temps, $f(t)$ est une fonction monotone qui croît lentement et $A$ est l'amplitude des oscillations autour de $f$, $T$ est leur fréquence et $t_0$ est leur origine temporelle." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Caractérisation des oscillations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regardons les dernières années afin de mieux caractériser les oscillations rapides." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data[-200:].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les oscillations semblent avoir une période d'un an (variations annuelles). Le pic de concentration a lieu au début de l'été, soit vers juin. Afin d'affiner l'analyse, nous pouvons considérer que sur un an, l'évolution de $f$ peut s'apparenter à une droite (sa tangente au milieu de l'an donné).\n", "\n", "Concentrons-nous sur la dernière année : entre automne 2023 et automne 2024. Nous cherchons à isoler les données entre début septembre 2023 et fin septembre 2024, afin d'être certains d'englober toute la variation annuelle." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "period1 = pd.Period(pd.Timestamp(2023, 9, 1), 'W')\n", "period2 = pd.Period(pd.Timestamp(2024, 9, 30), 'W')\n", "data_last_year = full_data[full_data[((full_data.index >= period1.start_time) & (full_data.index <= period1.end_time))].index[0]:\n", " full_data[((full_data.index >= period2.start_time) & (full_data.index <= period2.end_time))].index[0]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vérifions si nous comprenons bien toute l'année voulue :" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data_last_year.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parfait. Maintenant, trouvons les deux données minimales (au début et à la fin de l'oscillation respectivement) afin de trouver la pente de $f$. $f$ étant croissante, le premier minimum est le minimum de l'année isolée. Les valeurs de début d'année 2024 étant supérieures au deuxième minimum, nous pouvons le trouver en contraignant `year == 2024`. On vérifie par la suite que les minima trouvés sont bien ceux attendus par analyse graphique.\n", "\n", "Nous pourrons ensuite enlever cette contribution lente pour avoir une oscillation brute, et mesurer son amplitude.\n", "\n", "__Remarque__ : nous faisons en fait un traitement de signal à la main ; nous aurrions aussi pu utiliser un filtre passe-haut avec une coupure de fréquence choisie soigneusement pour obtenir le même résultat. Ce document étant censé être accessible à tous, il est préférable de ne pas utiliser de méthode trop technique." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023-09-23 417.77\n", "Freq: 7D, dtype: float64\n", "2024-09-14 421.39\n", "Freq: 7D, dtype: float64\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "min_2023 = data_last_year.where(data_last_year == min(data_last_year)).dropna()\n", "data_last_year_2024 = data_last_year[data_last_year.index.year == 2024]\n", "min_2024 = data_last_year_2024.where(data_last_year_2024 == min(data_last_year_2024)).dropna()\n", "print(min_2023)\n", "print(min_2024)\n", "\n", "# Pour définir la droite correspondant à l'évolution de f(t), il est plus simple de repasser sur des indices numériques\n", "data_last_year_filtered = pd.Series(data=data_last_year.values, index=range(len(data_last_year)))\n", "dy = min_2024.values[0] - min_2023.values[0]\n", "min_2023_filtered = data_last_year_filtered.where(data_last_year_filtered == min(data_last_year_filtered)).dropna()\n", "data_last_year_filtered_2024 = data_last_year_filtered[data_last_year.index.year == 2024]\n", "min_2024_filtered = data_last_year_filtered_2024.where(data_last_year_filtered_2024 == min(data_last_year_filtered_2024)).dropna()\n", "dx = min_2024_filtered.index[0] - min_2023_filtered.index[0]\n", "data_f = min_2023.values[0] + (dy / dx) * (range(len(data_last_year)) - min_2023_filtered.index[0])\n", "f = pd.Series(data=data_f, index=data_last_year.index)\n", "data_last_year_filtered.index = data_last_year.index\n", "\n", "data_last_year_filtered -= f\n", "data_last_year_filtered.index = data_last_year.index\n", "amp = data_last_year_filtered.max() / 2\n", "data_last_year_filtered.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modélisation de l'évolution en arrière-plan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous pouvons désormais isoler l'évolution lente d'arrière-plan $f(t)$ en moyennant sur un an les valeurs (on recrée artificiellement un filtre passe-bas) :" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test = full_data.copy()\n", "test[interpolated_marks == 1] = np.nan\n", "print(np.mean)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1958-03-29 316.118235\n", "1958-04-05 316.118235\n", "1958-04-12 316.118235\n", "1958-04-19 316.118235\n", "1958-04-26 316.118235\n", "1958-05-03 316.118235\n", "1958-05-10 315.947778\n", "1958-05-17 315.806316\n", "1958-05-24 315.694500\n", "1958-05-31 315.614286\n", "1958-06-07 315.566364\n", "1958-06-14 315.516087\n", "1958-06-21 315.485000\n", "1958-06-28 315.474000\n", "1958-07-05 315.465000\n", "1958-07-12 315.466296\n", "1958-07-19 315.474286\n", "1958-07-26 315.487586\n", "1958-08-02 315.485333\n", "1958-08-09 315.485333\n", "1958-08-16 315.532258\n", "1958-08-23 315.565938\n", "1958-08-30 315.597879\n", "1958-09-06 315.633529\n", "1958-09-13 315.633529\n", "1958-09-20 315.664857\n", "1958-09-27 315.693889\n", "1958-10-04 315.736389\n", "1958-10-11 315.731111\n", "1958-10-18 315.729722\n", " ... \n", "2024-04-20 423.620943\n", "2024-04-27 423.687170\n", "2024-05-04 423.772642\n", "2024-05-11 423.850189\n", "2024-05-18 NaN\n", "2024-05-25 NaN\n", "2024-06-01 NaN\n", "2024-06-08 NaN\n", "2024-06-15 NaN\n", "2024-06-22 NaN\n", "2024-06-29 NaN\n", "2024-07-06 NaN\n", "2024-07-13 NaN\n", "2024-07-20 NaN\n", "2024-07-27 NaN\n", "2024-08-03 NaN\n", "2024-08-10 NaN\n", "2024-08-17 NaN\n", "2024-08-24 NaN\n", "2024-08-31 NaN\n", "2024-09-07 NaN\n", "2024-09-14 NaN\n", "2024-09-21 NaN\n", "2024-09-28 NaN\n", "2024-10-05 NaN\n", "2024-10-12 NaN\n", "2024-10-19 NaN\n", "2024-10-26 NaN\n", "2024-11-02 NaN\n", "2024-11-09 NaN\n", "Freq: 7D, Length: 3477, dtype: float64\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "rolling_base = full_data.copy()\n", "rolling_base[interpolated_marks == 1] = np.nan\n", "rolling_mean_with_offset = rolling_base.rolling(window='365D', min_periods=1).mean()\n", "# This function computed the rolling mean for the year after each value,\n", "# and not for the 6 month before to 6 month after period for each value\n", "rolling_mean = rolling_mean_with_offset.shift(periods=-182, freq='D')\n", "print(rolling_mean.reindex(full_index))\n", "rolling_mean.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }