{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#
Analyse de la concentration de CO2,
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import datetime\n", "from scipy import interpolate\n", "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"last_expr\" \n", "# ‘all’|’last’|’last_expr’|’none’\n", "#https://jupyter-console.readthedocs.io/en/4.0.1/config_options.html\n", "pd.options.display.max_rows = 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous récupérons les données les plus récentes sur le site en pointant sur un fichier au format .csv, si cette récupération est possible nous enregistrons une copie de ce fichier. Si pour une raison quelconque nous n'arrivons pas a faire ce téléchargement, nous travaillons sur le dernières données téléchargé." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " raw_data = pd.read_csv(\"https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/\"\n", " \"in_situ_co2/monthly/monthly_in_situ_co2_mlo.csv\",skiprows=56)\n", "except OSError as err:\n", " print(\"Erreur lors du téléchargement : {0}\".format(err))\n", " print(\"Nous téléchargeons les dernières données enregistrer sur notre PC\")\n", " raw_data = pd.read_csv(\"monthly_in_situ_co2_mlo\")\n", "except:\n", " print(\"Unexpected error:\", sys.exc_info()[0])\n", " raise\n", "else:\n", " raw_data.to_csv('monthly_in_situ_co2_mlo',index=True)\n", "\n", "print(raw_data.shape)\n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous voyons que les noms des colonnes ne sont pas très représentatives, nous modifions les noms des colonnes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_data_1 = raw_data.copy()\n", "print(raw_data.columns)\n", "raw_data_1.columns = ['Yr','Mn','Date 1','Date 2','s1','s2','s3','s4','s5','s6']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_data_1.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données vide sont représentés par la valeur -99.99, nous remplaçons cette valeur par une valeur plus adéquate NaN dans une autre DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "date = datetime.date.today()\n", "data = raw_data_1.copy()\n", "data = data.drop(data[(data.Yr == date.year) & (data.Mn > date.month)].index)\n", "data = data.replace(-99.99,np.NaN);\n", "d = data[(data.Yr == date.year)].index\n", "i,k = d[0], d[-1]\n", "\n", "while k>=i:\n", " if data.loc[k].isnull().any():\n", " data = data.drop(k)\n", " else:\n", " break\n", " k = k - 1\n", " \n", "annee, mois = data.Yr[0], data.Mn[0]\n", "d = data[(data.Yr == annee)].index\n", "i,k = d[0], d[-1]\n", "\n", "while i<=k:\n", " if data.loc[i].isnull().any():\n", " data = data.drop(i)\n", " else:\n", " break\n", " i = i + 1\n", "InteractiveShell.ast_node_interactivity = \"all\" \n", "data.head(3)\n", "data.tail(3)\n", "InteractiveShell.ast_node_interactivity = \"last_expr\" " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On visualise les lignes dont une donnée colonne est manquante." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[data.isnull().any(axis=1)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On ajoute un index 'périod' à la DataFrame, cet index représente la période de mesure. \n", "Cette date est mise dans au format compréhensible par pandas. On visualise toutes les lignes qui seront supprimées." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "df = data.dropna().copy()\n", "df = df.reset_index().copy()\n", "#print(data.shape)\n", "period = [datetime.date(y,m,1) for y,m in zip(df['Yr'],df['Mn'])]\n", "period = pd.Series(period,name = 'period')\n", "#print(period.shape)\n", "df = pd.concat([df,period],axis=1)\n", "df = df.set_index('period') \n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Représentation graphique de la concentration de CO2 de 1958 à nos jours" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['s1'].plot(title = 'concentration de CO2',);\n", "df['s1'].plot(figsize=(15, 10),).grid(linestyle='--', linewidth=1);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous allons approximé la concentration de CO2 avec une droite $a*x+b$, puis faire la différence pour \n", "n'obtenir que les variations de la concentration de CO2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import stats\n", "\n", "a, b, r_value, p_value, std_err = stats.linregress(df['Date 2'], df['s1'])\n", "def predict(x):\n", " return a*x+b\n", "\n", "data_lineaire = df.copy()\n", "data_lineaire['reg_lineaire'] = predict(data_lineaire['Date 2'])\n", "\n", "fig = plt.figure(figsize=(18, 5))\n", "ax1 = fig.add_subplot(121)\n", "ax2 = fig.add_subplot(122)\n", "ax1.grid(linestyle='--', linewidth=1)\n", "ax2.grid(linestyle='--', linewidth=1)\n", "#plot(figsize=(8, 5)) .plot(figsize=(15, 10), grid=True).grid(linestyle='--', linewidth=1);\n", "\n", "ax1.set(title = 'Concentration CO2',xlabel='Période',ylabel='Concentration (ppm)')\n", "ax1.plot(data_lineaire['s1'])\n", "#data1['s1'].plot()\n", "ax1.plot(data_lineaire['reg_lineaire']) \n", "data_lineaire['co2'] = data_lineaire['reg_lineaire']-data_lineaire['s1']\n", "ax2.set(title = 'Variation de la concentration CO2',xlabel='Période',ylabel='Concentration (ppm)')\n", "ax2.plot(data_lineaire['co2']); " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le résultat n'est pas satisfaisant ...\n", "Nous pouvons faire une optimisation avec une fonction de la forme $a*(x-b)^2+c$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.optimize import curve_fit\n", "\n", "def func_cube(x,a,b,c):\n", " return a*(x-b)**(2)+c\n", "\n", "data_cube = df.copy()\n", "popt, pcov = curve_fit(func_cube,data_cube['Date 2'],data_cube['s1'])\n", "\n", "def fcube(x):\n", " return popt[0]*(x- popt[1])**(2)+popt[2]\n", "\n", "data_cube['reg_cube'] = fcube(data_cube['Date 2'])\n", "\n", "fig = plt.figure(figsize=(18, 5))\n", "ax1 = fig.add_subplot(121)\n", "ax2 = fig.add_subplot(122)\n", "ax1.grid(linestyle='--', linewidth=1)\n", "ax2.grid(linestyle='--', linewidth=1)\n", "\n", "ax1.set(title = 'Concentration CO2',xlabel='Période',ylabel='Concentration (ppm)')\n", "ax1.plot(data_cube['s1'])\n", "ax1.plot(data_cube['reg_cube']) \n", "data_cube['co2'] = data_cube['reg_cube']-data_cube['s1']\n", "ax2.set(title = 'Variation de la concentration CO2',xlabel='Période',ylabel='Concentration (ppm)')\n", "ax2.plot(data_cube['co2']); " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous recherchons l'oscillation lente, pour cela nous allons opéré en 2 etapes :\n", "- Recherche de la fréquence lente, par FFT. Pour cela nous devons faire une interpolation pour les quelque points manquant. Ce qui nous donnera une gamme de fréquence possible (échantillonne de la FFT)\n", "- Pour la gamme de fréquence spécifiées, nous estimons tous les paramètres pour une approximation sinusoïdale (moyenne, amplitude et phase) par les moindres carrés ordinaires \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = data.reset_index().copy()\n", "df['period'] = pd.Series([datetime.date(y,m,1) for y,m in zip(df['Yr'],df['Mn'])])\n", "df = df.set_index('period') \n", "df.head()\n", "d = df[df.isnull().any(axis=1)]\n", "d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour les valeur manquantes, nous allons faire une interpolation linéaire afin que la FFT dispose de données prisent à échantillonnage de 1 mois." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = df.interpolate(method='linear', limit_direction='forward',limit=3)\n", "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = df[(df.Yr >= 1964) & (df.Yr < 1965)]\n", "d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_cube = df.copy()\n", "popt, pcov = curve_fit(func_cube,data_cube['Date 2'],data_cube['s1'])\n", "data_cube['reg_cube'] = fcube(data_cube['Date 2'])\n", "data_cube['co2'] = data_cube['reg_cube']-data_cube['s1']\n", "\n", "nb = len(data_cube['Date 2'])\n", "ecart = []\n", "for i in np.arange(0,nb-1):\n", " ecart.append(data_cube['Date 2'][i+1] - data_cube['Date 2'][i])\n", "dt = np.mean(ecart)\n", "dt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Utilisation de la fonction np.fft.fft pour transformer le signal temporel en signal fréquentiel à l’aide de la transformée de Fourier rapide. Le résultat s1_fft est un tableau de nombres complexes. La densité spectrale de puissance est calculée à l’aide de la formule Γx=|X|2T. Les fréquences sont ensuite calculées à l’aide de la fonction np.fft.fftfreq." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s1 = data_cube.co2 - data_cube.co2.mean()\n", "\n", "# calcul de la transformee de Fourier et des frequences\n", "s1_fft = np.fft.fft(s1)\n", "\n", "n = s1.size\n", "dt = 1/12\n", "pds_s1 = (dt/n) * np.abs(s1_fft)**2 # densité spectral de puissance\n", "freq = np.fft.fftfreq(n, d=dt) # fréquences associées\n", "\n", "freq = freq[:n/2]\n", "pds_s1 = pds_s1[:n/2]\n", "# affichage de la transformee de Fourier\n", "plt.figure(figsize=(18, 10))\n", "plt.stem(freq, pds_s1, label=\"PSD\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finalement, seules les « moitiés droites » des tableaux sont conservées car les gauches contiennent les fréquences négatives. Ces opérations sont réalisées àl’aide duslicingpython (tranchage)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.stem(freq[0:100], abs(s1_fft.real[0:100]), label=\"real\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(s1_fft.real)\n", "N = (len(s1_fft.real)+1)/2\n", "N = int(N)\n", "indide_f = np.where(s1_fft.real[0:N] > 500)\n", "f_lente = indide_f[0][0] * (1/dt)\n", "\n", "f_rapide = indide_f[0][-1] * (1/dt)\n", "print(f' Estimation de la frequence lente : {f_lente} \\n Esitmation de la fréquence rapide : {f_rapide}')\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }