{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Concentration de CO2 dans l'atmosphère depuis 1958" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "Le but de cette étude est d'analyser l'évolution de la concentration en CO2 dans l'atmosphère, en mettant en pratique les outils de recherche reproductible." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import isoweek\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#raw_data = pd.read_csv(\"https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/monthly/monthly_in_situ_co2_mlo.csv\", skiprows = 54, sep=r'\\s*,\\s*', engine='python')\n", "raw_data = pd.read_csv(\"monthly_in_situ_co2_mlo.csv\", skiprows = 54, sep=r'\\s*,\\s*', engine='python')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données ont été extraites le 11/05/2020. On travaillera avec une copie locale mais la ligne commentée permet le téléchargement des données à la source. \n", "Les 54 premières lignes correspondent à du texte contenant les références à citer, des explications sur la forme des données ... On les supprime donc pour permettre à Pandas de lire les données sous forme de tableau. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YrMnDateDate.1CO2seasonallyfitseasonally.1CO2.1seasonally.2
0NaNNaNNaNNaNNaNadjustedNaNadjusted fitfilledadjusted filled
1NaNNaNExcelNaN[ppm][ppm][ppm][ppm][ppm][ppm]
21958.01.0212001958.0411-99.99-99.99-99.99-99.99-99.99-99.99
31958.02.0212311958.1260-99.99-99.99-99.99-99.99-99.99-99.99
41958.03.0212591958.2027315.70314.44316.18314.90315.70314.44
51958.04.0212901958.2877317.46315.16317.29314.98317.46315.16
61958.05.0213201958.3699317.51314.71317.86315.06317.51314.71
71958.06.0213511958.4548-99.99-99.99317.24315.14317.24315.14
81958.07.0213811958.5370315.86315.19315.86315.21315.86315.19
91958.08.0214121958.6219314.93316.19313.99315.28314.93316.19
101958.09.0214431958.7068313.21316.08312.45315.35313.21316.08
111958.010.0214731958.7890-99.99-99.99312.43315.40312.43315.40
121958.011.0215041958.8740313.33315.20313.61315.46313.33315.20
131958.012.0215341958.9562314.67315.43314.76315.51314.67315.43
141959.01.0215651959.0411315.58315.54315.62315.57315.58315.54
151959.02.0215961959.1260316.49315.86316.27315.63316.49315.86
161959.03.0216241959.2027316.65315.38316.98315.69316.65315.38
171959.04.0216551959.2877317.72315.42318.09315.77317.72315.42
181959.05.0216851959.3699318.29315.49318.65315.85318.29315.49
191959.06.0217161959.4548318.15316.03318.04315.94318.15316.03
201959.07.0217461959.5370316.54315.86316.67316.03316.54315.86
211959.08.0217771959.6219314.80316.06314.82316.12314.80316.06
221959.09.0218081959.7068313.84316.73313.31316.22313.84316.73
231959.010.0218381959.7890313.33316.33313.32316.30313.33316.33
241959.011.0218691959.8740314.81316.68314.54316.39314.81316.68
251959.012.0218991959.9562315.58316.35315.72316.47315.58316.35
261960.01.0219301960.0410316.43316.39316.61316.56316.43316.39
271960.02.0219611960.1257316.98316.35317.27316.64316.98316.35
281960.03.0219901960.2049317.58316.28318.03316.71317.58316.28
291960.04.0220211960.2896319.03316.70319.14316.79319.03316.70
.................................
7282018.07.0432962018.5370408.90408.08409.44408.65408.90408.08
7292018.08.0433272018.6219407.10408.63407.34408.91407.10408.63
7302018.09.0433582018.7068405.59409.08405.67409.19405.59409.08
7312018.010.0433882018.7890405.99409.61405.85409.45405.99409.61
7322018.011.0434192018.8740408.12410.38407.49409.73408.12410.38
7332018.012.0434492018.9562409.23410.15409.08409.99409.23410.15
7342019.01.0434802019.0411410.92410.87410.31410.25410.92410.87
7352019.02.0435112019.1260411.66410.90411.26410.49411.66410.90
7362019.03.0435392019.2027412.00410.46412.26410.70412.00410.46
7372019.04.0435702019.2877413.52410.72413.75410.93413.52410.72
7382019.05.0436002019.3699414.83411.42414.55411.15414.83411.42
7392019.06.0436312019.4548413.96411.38413.92411.37413.96411.38
7402019.07.0436612019.5370411.85411.03412.37411.58411.85411.03
7412019.08.0436922019.6219410.08411.62410.23411.80410.08411.62
7422019.09.0437232019.7068408.55412.06408.50412.03408.55412.06
7432019.010.0437532019.7890408.43412.06408.63412.24408.43412.06
7442019.011.0437842019.8740410.29412.56410.22412.47410.29412.56
7452019.012.0438142019.9562411.85412.78411.77412.68411.85412.78
7462020.01.0438452020.0410413.37413.32412.96412.89413.37413.32
7472020.02.0438762020.1257414.09413.33413.87413.10414.09413.33
7482020.03.0439052020.2049414.51412.94414.88413.29414.51412.94
7492020.04.0439362020.2896416.18413.35-99.99-99.99416.18413.35
7502020.05.0439662020.3716-99.99-99.99-99.99-99.99-99.99-99.99
7512020.06.0439972020.4563-99.99-99.99-99.99-99.99-99.99-99.99
7522020.07.0440272020.5383-99.99-99.99-99.99-99.99-99.99-99.99
7532020.08.0440582020.6230-99.99-99.99-99.99-99.99-99.99-99.99
7542020.09.0440892020.7077-99.99-99.99-99.99-99.99-99.99-99.99
7552020.010.0441192020.7896-99.99-99.99-99.99-99.99-99.99-99.99
7562020.011.0441502020.8743-99.99-99.99-99.99-99.99-99.99-99.99
7572020.012.0441802020.9563-99.99-99.99-99.99-99.99-99.99-99.99
\n", "

758 rows × 10 columns

\n", "
" ], "text/plain": [ " Yr Mn Date Date.1 CO2 seasonally fit seasonally.1 \\\n", "0 NaN NaN NaN NaN NaN adjusted NaN adjusted fit \n", "1 NaN NaN Excel NaN [ppm] [ppm] [ppm] [ppm] \n", "2 1958.0 1.0 21200 1958.0411 -99.99 -99.99 -99.99 -99.99 \n", "3 1958.0 2.0 21231 1958.1260 -99.99 -99.99 -99.99 -99.99 \n", "4 1958.0 3.0 21259 1958.2027 315.70 314.44 316.18 314.90 \n", "5 1958.0 4.0 21290 1958.2877 317.46 315.16 317.29 314.98 \n", "6 1958.0 5.0 21320 1958.3699 317.51 314.71 317.86 315.06 \n", "7 1958.0 6.0 21351 1958.4548 -99.99 -99.99 317.24 315.14 \n", "8 1958.0 7.0 21381 1958.5370 315.86 315.19 315.86 315.21 \n", "9 1958.0 8.0 21412 1958.6219 314.93 316.19 313.99 315.28 \n", "10 1958.0 9.0 21443 1958.7068 313.21 316.08 312.45 315.35 \n", "11 1958.0 10.0 21473 1958.7890 -99.99 -99.99 312.43 315.40 \n", "12 1958.0 11.0 21504 1958.8740 313.33 315.20 313.61 315.46 \n", "13 1958.0 12.0 21534 1958.9562 314.67 315.43 314.76 315.51 \n", "14 1959.0 1.0 21565 1959.0411 315.58 315.54 315.62 315.57 \n", "15 1959.0 2.0 21596 1959.1260 316.49 315.86 316.27 315.63 \n", "16 1959.0 3.0 21624 1959.2027 316.65 315.38 316.98 315.69 \n", "17 1959.0 4.0 21655 1959.2877 317.72 315.42 318.09 315.77 \n", "18 1959.0 5.0 21685 1959.3699 318.29 315.49 318.65 315.85 \n", "19 1959.0 6.0 21716 1959.4548 318.15 316.03 318.04 315.94 \n", "20 1959.0 7.0 21746 1959.5370 316.54 315.86 316.67 316.03 \n", "21 1959.0 8.0 21777 1959.6219 314.80 316.06 314.82 316.12 \n", "22 1959.0 9.0 21808 1959.7068 313.84 316.73 313.31 316.22 \n", "23 1959.0 10.0 21838 1959.7890 313.33 316.33 313.32 316.30 \n", "24 1959.0 11.0 21869 1959.8740 314.81 316.68 314.54 316.39 \n", "25 1959.0 12.0 21899 1959.9562 315.58 316.35 315.72 316.47 \n", "26 1960.0 1.0 21930 1960.0410 316.43 316.39 316.61 316.56 \n", "27 1960.0 2.0 21961 1960.1257 316.98 316.35 317.27 316.64 \n", "28 1960.0 3.0 21990 1960.2049 317.58 316.28 318.03 316.71 \n", "29 1960.0 4.0 22021 1960.2896 319.03 316.70 319.14 316.79 \n", ".. ... ... ... ... ... ... ... ... \n", "728 2018.0 7.0 43296 2018.5370 408.90 408.08 409.44 408.65 \n", "729 2018.0 8.0 43327 2018.6219 407.10 408.63 407.34 408.91 \n", "730 2018.0 9.0 43358 2018.7068 405.59 409.08 405.67 409.19 \n", "731 2018.0 10.0 43388 2018.7890 405.99 409.61 405.85 409.45 \n", "732 2018.0 11.0 43419 2018.8740 408.12 410.38 407.49 409.73 \n", "733 2018.0 12.0 43449 2018.9562 409.23 410.15 409.08 409.99 \n", "734 2019.0 1.0 43480 2019.0411 410.92 410.87 410.31 410.25 \n", "735 2019.0 2.0 43511 2019.1260 411.66 410.90 411.26 410.49 \n", "736 2019.0 3.0 43539 2019.2027 412.00 410.46 412.26 410.70 \n", "737 2019.0 4.0 43570 2019.2877 413.52 410.72 413.75 410.93 \n", "738 2019.0 5.0 43600 2019.3699 414.83 411.42 414.55 411.15 \n", "739 2019.0 6.0 43631 2019.4548 413.96 411.38 413.92 411.37 \n", "740 2019.0 7.0 43661 2019.5370 411.85 411.03 412.37 411.58 \n", "741 2019.0 8.0 43692 2019.6219 410.08 411.62 410.23 411.80 \n", "742 2019.0 9.0 43723 2019.7068 408.55 412.06 408.50 412.03 \n", "743 2019.0 10.0 43753 2019.7890 408.43 412.06 408.63 412.24 \n", "744 2019.0 11.0 43784 2019.8740 410.29 412.56 410.22 412.47 \n", "745 2019.0 12.0 43814 2019.9562 411.85 412.78 411.77 412.68 \n", "746 2020.0 1.0 43845 2020.0410 413.37 413.32 412.96 412.89 \n", "747 2020.0 2.0 43876 2020.1257 414.09 413.33 413.87 413.10 \n", "748 2020.0 3.0 43905 2020.2049 414.51 412.94 414.88 413.29 \n", "749 2020.0 4.0 43936 2020.2896 416.18 413.35 -99.99 -99.99 \n", "750 2020.0 5.0 43966 2020.3716 -99.99 -99.99 -99.99 -99.99 \n", "751 2020.0 6.0 43997 2020.4563 -99.99 -99.99 -99.99 -99.99 \n", "752 2020.0 7.0 44027 2020.5383 -99.99 -99.99 -99.99 -99.99 \n", "753 2020.0 8.0 44058 2020.6230 -99.99 -99.99 -99.99 -99.99 \n", "754 2020.0 9.0 44089 2020.7077 -99.99 -99.99 -99.99 -99.99 \n", "755 2020.0 10.0 44119 2020.7896 -99.99 -99.99 -99.99 -99.99 \n", "756 2020.0 11.0 44150 2020.8743 -99.99 -99.99 -99.99 -99.99 \n", "757 2020.0 12.0 44180 2020.9563 -99.99 -99.99 -99.99 -99.99 \n", "\n", " CO2.1 seasonally.2 \n", "0 filled adjusted filled \n", "1 [ppm] [ppm] \n", "2 -99.99 -99.99 \n", "3 -99.99 -99.99 \n", "4 315.70 314.44 \n", "5 317.46 315.16 \n", "6 317.51 314.71 \n", "7 317.24 315.14 \n", "8 315.86 315.19 \n", "9 314.93 316.19 \n", "10 313.21 316.08 \n", "11 312.43 315.40 \n", "12 313.33 315.20 \n", "13 314.67 315.43 \n", "14 315.58 315.54 \n", "15 316.49 315.86 \n", "16 316.65 315.38 \n", "17 317.72 315.42 \n", "18 318.29 315.49 \n", "19 318.15 316.03 \n", "20 316.54 315.86 \n", "21 314.80 316.06 \n", "22 313.84 316.73 \n", "23 313.33 316.33 \n", "24 314.81 316.68 \n", "25 315.58 316.35 \n", "26 316.43 316.39 \n", "27 316.98 316.35 \n", "28 317.58 316.28 \n", "29 319.03 316.70 \n", ".. ... ... \n", "728 408.90 408.08 \n", "729 407.10 408.63 \n", "730 405.59 409.08 \n", "731 405.99 409.61 \n", "732 408.12 410.38 \n", "733 409.23 410.15 \n", "734 410.92 410.87 \n", "735 411.66 410.90 \n", "736 412.00 410.46 \n", "737 413.52 410.72 \n", "738 414.83 411.42 \n", "739 413.96 411.38 \n", "740 411.85 411.03 \n", "741 410.08 411.62 \n", "742 408.55 412.06 \n", "743 408.43 412.06 \n", "744 410.29 412.56 \n", "745 411.85 412.78 \n", "746 413.37 413.32 \n", "747 414.09 413.33 \n", "748 414.51 412.94 \n", "749 416.18 413.35 \n", "750 -99.99 -99.99 \n", "751 -99.99 -99.99 \n", "752 -99.99 -99.99 \n", "753 -99.99 -99.99 \n", "754 -99.99 -99.99 \n", "755 -99.99 -99.99 \n", "756 -99.99 -99.99 \n", "757 -99.99 -99.99 \n", "\n", "[758 rows x 10 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les deux premières lignes contiennent des unités et non des valeurs, on les retire du tableau pour l'instant." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "data = raw_data.iloc[2:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour ce jeu de données, les 4 premières colonnes sont des dates, et seule la colonne 5 contient des mesures brutes. Nous allons conserver uniquement les informations sur l'année, le mois, et la valeur brute de la mesure." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "useful_data = data.iloc[0:len(data.index), [0,1,4]]\n", "#useful_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On vérifie que les données ont un type approprié." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1958.0\n", " 2.0\n", " -99.99\n" ] } ], "source": [ "print(type(useful_data['Yr'][3]), useful_data['Yr'][3])\n", "print(type(useful_data['Mn'][3]), useful_data['Mn'][3])\n", "print(type(useful_data['CO2'][3]), useful_data['CO2'][3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit que la troisième colonne n'est pas bien interprétée, peut être à cause du signe '-'. On essaye de convertir les données." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "useful_data['CO2'] = useful_data['CO2'].astype(float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les explications jointes au fichier indiquent que les valeurs manquantes sont remplacées par la valeur -99.99. On souhaite donc supprimer chaque ligne comportant cette valeur." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 3, 7, 11, 75, 76, 77, 750, 751, 752, 753, 754, 755, 756, 757]\n" ] } ], "source": [ "liste = []\n", "for i in range(len(useful_data.index)):\n", " try:\n", " if(useful_data['CO2'][useful_data.index[i]] == -99.99):\n", " liste.append(useful_data.index[i])\n", " except:\n", " print(i, ' ', end='')\n", "print(liste)\n", "useful_data.drop(liste, inplace=True)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "useful_data\n", "useful_data_copie = pd.DataFrame.copy(useful_data, deep = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On souhaite maintenant convertir l'année et le mois en un format plus adapté à Pandas, et à l'utiliser comme index. Un méthode possible est présentée ici, en rassemblant les deux informations puis en appliquant une fonction pour une mise au format Pandas." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "useful_data['period'] = useful_data['Yr']*100 + useful_data['Mn']" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "useful_data['period'] = useful_data['period'].astype(int)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "useful_data = useful_data.iloc[0:len(useful_data.index), [2,3]]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CO2
period
1958-03315.70
1958-04317.46
1958-05317.51
1958-07315.86
1958-08314.93
1958-09313.21
1958-11313.33
1958-12314.67
1959-01315.58
1959-02316.49
1959-03316.65
1959-04317.72
1959-05318.29
1959-06318.15
1959-07316.54
1959-08314.80
1959-09313.84
1959-10313.33
1959-11314.81
1959-12315.58
1960-01316.43
1960-02316.98
1960-03317.58
1960-04319.03
1960-05320.04
1960-06319.58
1960-07318.18
1960-08315.90
1960-09314.17
1960-10313.83
......
2017-11405.17
2017-12406.75
2018-01408.05
2018-02408.34
2018-03409.25
2018-04410.30
2018-05411.30
2018-06410.88
2018-07408.90
2018-08407.10
2018-09405.59
2018-10405.99
2018-11408.12
2018-12409.23
2019-01410.92
2019-02411.66
2019-03412.00
2019-04413.52
2019-05414.83
2019-06413.96
2019-07411.85
2019-08410.08
2019-09408.55
2019-10408.43
2019-11410.29
2019-12411.85
2020-01413.37
2020-02414.09
2020-03414.51
2020-04416.18
\n", "

741 rows × 1 columns

\n", "
" ], "text/plain": [ " CO2\n", "period \n", "1958-03 315.70\n", "1958-04 317.46\n", "1958-05 317.51\n", "1958-07 315.86\n", "1958-08 314.93\n", "1958-09 313.21\n", "1958-11 313.33\n", "1958-12 314.67\n", "1959-01 315.58\n", "1959-02 316.49\n", "1959-03 316.65\n", "1959-04 317.72\n", "1959-05 318.29\n", "1959-06 318.15\n", "1959-07 316.54\n", "1959-08 314.80\n", "1959-09 313.84\n", "1959-10 313.33\n", "1959-11 314.81\n", "1959-12 315.58\n", "1960-01 316.43\n", "1960-02 316.98\n", "1960-03 317.58\n", "1960-04 319.03\n", "1960-05 320.04\n", "1960-06 319.58\n", "1960-07 318.18\n", "1960-08 315.90\n", "1960-09 314.17\n", "1960-10 313.83\n", "... ...\n", "2017-11 405.17\n", "2017-12 406.75\n", "2018-01 408.05\n", "2018-02 408.34\n", "2018-03 409.25\n", "2018-04 410.30\n", "2018-05 411.30\n", "2018-06 410.88\n", "2018-07 408.90\n", "2018-08 407.10\n", "2018-09 405.59\n", "2018-10 405.99\n", "2018-11 408.12\n", "2018-12 409.23\n", "2019-01 410.92\n", "2019-02 411.66\n", "2019-03 412.00\n", "2019-04 413.52\n", "2019-05 414.83\n", "2019-06 413.96\n", "2019-07 411.85\n", "2019-08 410.08\n", "2019-09 408.55\n", "2019-10 408.43\n", "2019-11 410.29\n", "2019-12 411.85\n", "2020-01 413.37\n", "2020-02 414.09\n", "2020-03 414.51\n", "2020-04 416.18\n", "\n", "[741 rows x 1 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def convertIntoPeriod(anneeEtMois):\n", " y = (int)(anneeEtMois/100)\n", " m = (int)(anneeEtMois%100)\n", " return pd.Period(pd.Timestamp(y,m,1), 'M')\n", "useful_data['period'] = [convertIntoPeriod(date) for date in useful_data['period']]\n", "useful_data.set_index('period')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "useful_data['CO2'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On a un premier aperçu de nos données, mais l'échelle ne correspond pas à ce que nous voulons. De plus, il va être difficile avec des données manquantes de travailler proprement avec ces indices. On va donc repartir d'une copie de useful_data, et renseigner la date sous la forme du nombre de mois en partant de l'an 1958. Janvier 1959 sera donc référencé par \"13\", etc." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YrMnCO2IndexMois
41958.03.0315.703
51958.04.0317.464
61958.05.0317.515
81958.07.0315.867
91958.08.0314.938
101958.09.0313.219
121958.011.0313.3311
131958.012.0314.6712
141959.01.0315.5813
151959.02.0316.4914
161959.03.0316.6515
171959.04.0317.7216
181959.05.0318.2917
191959.06.0318.1518
201959.07.0316.5419
211959.08.0314.8020
221959.09.0313.8421
231959.010.0313.3322
241959.011.0314.8123
251959.012.0315.5824
261960.01.0316.4325
271960.02.0316.9826
281960.03.0317.5827
291960.04.0319.0328
301960.05.0320.0429
311960.06.0319.5830
321960.07.0318.1831
331960.08.0315.9032
341960.09.0314.1733
351960.010.0313.8334
...............
7202017.011.0405.17719
7212017.012.0406.75720
7222018.01.0408.05721
7232018.02.0408.34722
7242018.03.0409.25723
7252018.04.0410.30724
7262018.05.0411.30725
7272018.06.0410.88726
7282018.07.0408.90727
7292018.08.0407.10728
7302018.09.0405.59729
7312018.010.0405.99730
7322018.011.0408.12731
7332018.012.0409.23732
7342019.01.0410.92733
7352019.02.0411.66734
7362019.03.0412.00735
7372019.04.0413.52736
7382019.05.0414.83737
7392019.06.0413.96738
7402019.07.0411.85739
7412019.08.0410.08740
7422019.09.0408.55741
7432019.010.0408.43742
7442019.011.0410.29743
7452019.012.0411.85744
7462020.01.0413.37745
7472020.02.0414.09746
7482020.03.0414.51747
7492020.04.0416.18748
\n", "

741 rows × 4 columns

\n", "
" ], "text/plain": [ " Yr Mn CO2 IndexMois\n", "4 1958.0 3.0 315.70 3\n", "5 1958.0 4.0 317.46 4\n", "6 1958.0 5.0 317.51 5\n", "8 1958.0 7.0 315.86 7\n", "9 1958.0 8.0 314.93 8\n", "10 1958.0 9.0 313.21 9\n", "12 1958.0 11.0 313.33 11\n", "13 1958.0 12.0 314.67 12\n", "14 1959.0 1.0 315.58 13\n", "15 1959.0 2.0 316.49 14\n", "16 1959.0 3.0 316.65 15\n", "17 1959.0 4.0 317.72 16\n", "18 1959.0 5.0 318.29 17\n", "19 1959.0 6.0 318.15 18\n", "20 1959.0 7.0 316.54 19\n", "21 1959.0 8.0 314.80 20\n", "22 1959.0 9.0 313.84 21\n", "23 1959.0 10.0 313.33 22\n", "24 1959.0 11.0 314.81 23\n", "25 1959.0 12.0 315.58 24\n", "26 1960.0 1.0 316.43 25\n", "27 1960.0 2.0 316.98 26\n", "28 1960.0 3.0 317.58 27\n", "29 1960.0 4.0 319.03 28\n", "30 1960.0 5.0 320.04 29\n", "31 1960.0 6.0 319.58 30\n", "32 1960.0 7.0 318.18 31\n", "33 1960.0 8.0 315.90 32\n", "34 1960.0 9.0 314.17 33\n", "35 1960.0 10.0 313.83 34\n", ".. ... ... ... ...\n", "720 2017.0 11.0 405.17 719\n", "721 2017.0 12.0 406.75 720\n", "722 2018.0 1.0 408.05 721\n", "723 2018.0 2.0 408.34 722\n", "724 2018.0 3.0 409.25 723\n", "725 2018.0 4.0 410.30 724\n", "726 2018.0 5.0 411.30 725\n", "727 2018.0 6.0 410.88 726\n", "728 2018.0 7.0 408.90 727\n", "729 2018.0 8.0 407.10 728\n", "730 2018.0 9.0 405.59 729\n", "731 2018.0 10.0 405.99 730\n", "732 2018.0 11.0 408.12 731\n", "733 2018.0 12.0 409.23 732\n", "734 2019.0 1.0 410.92 733\n", "735 2019.0 2.0 411.66 734\n", "736 2019.0 3.0 412.00 735\n", "737 2019.0 4.0 413.52 736\n", "738 2019.0 5.0 414.83 737\n", "739 2019.0 6.0 413.96 738\n", "740 2019.0 7.0 411.85 739\n", "741 2019.0 8.0 410.08 740\n", "742 2019.0 9.0 408.55 741\n", "743 2019.0 10.0 408.43 742\n", "744 2019.0 11.0 410.29 743\n", "745 2019.0 12.0 411.85 744\n", "746 2020.0 1.0 413.37 745\n", "747 2020.0 2.0 414.09 746\n", "748 2020.0 3.0 414.51 747\n", "749 2020.0 4.0 416.18 748\n", "\n", "[741 rows x 4 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "udc = useful_data_copie\n", "udc['IndexMois'] = [(int)(udc['Mn'][x] + (udc['Yr'][x] - 1958)*12) for x in udc.index]\n", "udc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On vérifie à l'aide de la dernière valeur que tout est correct :" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "748" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "testIndex = (2020 - 1958)*12 + 4\n", "testIndex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On utilise notre nouvelle colonne comme index et on supprime les autres." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CO2IndexMois
4315.703
5317.464
6317.515
8315.867
9314.938
10313.219
12313.3311
13314.6712
14315.5813
15316.4914
16316.6515
17317.7216
18318.2917
19318.1518
20316.5419
21314.8020
22313.8421
23313.3322
24314.8123
25315.5824
26316.4325
27316.9826
28317.5827
29319.0328
30320.0429
31319.5830
32318.1831
33315.9032
34314.1733
35313.8334
.........
720405.17719
721406.75720
722408.05721
723408.34722
724409.25723
725410.30724
726411.30725
727410.88726
728408.90727
729407.10728
730405.59729
731405.99730
732408.12731
733409.23732
734410.92733
735411.66734
736412.00735
737413.52736
738414.83737
739413.96738
740411.85739
741410.08740
742408.55741
743408.43742
744410.29743
745411.85744
746413.37745
747414.09746
748414.51747
749416.18748
\n", "

741 rows × 2 columns

\n", "
" ], "text/plain": [ " CO2 IndexMois\n", "4 315.70 3\n", "5 317.46 4\n", "6 317.51 5\n", "8 315.86 7\n", "9 314.93 8\n", "10 313.21 9\n", "12 313.33 11\n", "13 314.67 12\n", "14 315.58 13\n", "15 316.49 14\n", "16 316.65 15\n", "17 317.72 16\n", "18 318.29 17\n", "19 318.15 18\n", "20 316.54 19\n", "21 314.80 20\n", "22 313.84 21\n", "23 313.33 22\n", "24 314.81 23\n", "25 315.58 24\n", "26 316.43 25\n", "27 316.98 26\n", "28 317.58 27\n", "29 319.03 28\n", "30 320.04 29\n", "31 319.58 30\n", "32 318.18 31\n", "33 315.90 32\n", "34 314.17 33\n", "35 313.83 34\n", ".. ... ...\n", "720 405.17 719\n", "721 406.75 720\n", "722 408.05 721\n", "723 408.34 722\n", "724 409.25 723\n", "725 410.30 724\n", "726 411.30 725\n", "727 410.88 726\n", "728 408.90 727\n", "729 407.10 728\n", "730 405.59 729\n", "731 405.99 730\n", "732 408.12 731\n", "733 409.23 732\n", "734 410.92 733\n", "735 411.66 734\n", "736 412.00 735\n", "737 413.52 736\n", "738 414.83 737\n", "739 413.96 738\n", "740 411.85 739\n", "741 410.08 740\n", "742 408.55 741\n", "743 408.43 742\n", "744 410.29 743\n", "745 411.85 744\n", "746 413.37 745\n", "747 414.09 746\n", "748 414.51 747\n", "749 416.18 748\n", "\n", "[741 rows x 2 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "del udc['Yr']\n", "del udc['Mn']\n", "udc.reset_index()\n", "udc.set_index('IndexMois')\n", "udc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour une raison quelconque Pandas refuse d'indexer correctement le tableau ... Tant pis. On utilisera la colonne IndexMois en guise d'abscisses pour les plots." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(udc['IndexMois'], udc['CO2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On vérifie la façon dont les données manquantes sont gérées." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(udc['IndexMois'][0:10], udc['CO2'][0:10])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4 3\n", "5 4\n", "6 5\n", "8 7\n", "9 8\n", "10 9\n", "12 11\n", "13 12\n", "14 13\n", "15 14\n", "Name: IndexMois, dtype: int64 4 315.70\n", "5 317.46\n", "6 317.51\n", "8 315.86\n", "9 314.93\n", "10 313.21\n", "12 313.33\n", "13 314.67\n", "14 315.58\n", "15 316.49\n", "Name: CO2, dtype: float64\n" ] } ], "source": [ "print(udc['IndexMois'][0:10], udc['CO2'][0:10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit que la valeur 6 en abscisse n'a pas d'ordonnée, et que la droite est tracée entre les points 5 et 7. Il n'y a pas de problème. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On peut s'intéresser maintenant aux résultats. On voit une croissance globale, et des oscillations locales. On peut zoomer sur trois années pour voir ce qu'il se passe localement par exemple." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(udc['IndexMois'][24:60], udc['CO2'][24:60])" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "29 5 / 30 6 / 31 7 / 32 8 / 33 9 / 34 10 / 35 11 / 36 0 / 37 1 / 38 2 / 39 3 / 40 4 / 41 5 / 42 6 / 43 7 / 44 8 / 45 9 / 46 10 / 47 11 / 48 0 / 49 1 / 50 2 / 51 3 / 52 4 / 53 5 / 54 6 / 55 7 / 56 8 / 57 9 / 58 10 / 59 11 / 60 0 / " ] } ], "source": [ "for i in range(29,61):\n", " print(i, i%12, end=' / ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit des minima locaux aux abscisses 34, 45, 58 qui correspondent aux mois Octobre, Septembre, Octobre. Il semble donc que la concentration en CO2 soit périodiquement minimale à cette période de l'année. De même, on voit des maxima locaux aux abscisses 41 et 54, soit en mai et juin. On peut faire une autre vérification par précaution." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "400 4 / 401 5 / 402 6 / 403 7 / 404 8 / 405 9 / 406 10 / 407 11 / 408 0 / 409 1 / 410 2 / 411 3 / 412 4 / 413 5 / 414 6 / 415 7 / 416 8 / 417 9 / 418 10 / 419 11 / 420 0 / 421 1 / 422 2 / 423 3 / 424 4 / 425 5 / 426 6 / 427 7 / 428 8 / 429 9 / 430 10 / 431 11 / 432 0 / 433 1 / 434 2 / 435 3 / 436 4 / " ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(udc['IndexMois'][400:436], udc['CO2'][400:436])\n", "for i in range(400,437):\n", " print(i, i%12, end=' / ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On voit les maxima en 413 (mai), 425 (mai), 437(mai) et des les minima en 417 (septembre), 429 (septembre), 441 (septembre). L'hypothèse se tient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour caractériser la croissance et faire des prévisions pour les années à venir, on souhaite joindre une courbe de tendance et son équation à ces données. On s'intéressera juste aux moyennes annuelles ici. \n", "On présente ici 3 \"fit\", respectivement linéaire, polynomial de degré 2 et exponentiel. On choisira par la suite celui qui nous semble le plus en adéquation." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#Fit lineaire avec numpy\n", "a, b = np.polyfit(udc['IndexMois'], udc['CO2'], 1)\n", "yLin = a*udc['IndexMois'] + b\n", "\n", "#Fit de degré 2\n", "a2, b2, c2 = np.polyfit(udc['IndexMois'], udc['CO2'], 2)\n", "yCarre = a2*udc['IndexMois']**2 + b2*udc['IndexMois'] + c2\n", "\n", "#Fit exponentiel\n", "aExp, bExp = np.polyfit(udc['IndexMois'], [np.log(y) for y in udc['CO2']], 1)\n", "yExp = np.exp(bExp)*np.exp(aExp*udc['IndexMois'])\n", "\n", "plt.plot(udc['IndexMois'], udc['CO2'], label='data')\n", "plt.plot(udc['IndexMois'], yLin, label='lin')\n", "plt.plot(udc['IndexMois'], yCarre, label='deg2')\n", "plt.plot(udc['IndexMois'], yExp, label='exp')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le polynôme de degré 2 est plus adapté à nos données ici. Pour faire des extrapolations sur les années à suivre, il suffit de tracer la courbe en étendant sur la plage de valeurs des abscisses qui nous intéresse. Pour obtenir des valeurs annuelles moyennes, il suffit d'intégrer la fonction sur 12 mois." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#on va afficher les valeurs de 2010 à 2025, sachant que nos données s'arrêtent à Avril 2020 (748)\n", "borneJanvier2010 = (2010-1958)*12 + 1\n", "borneDecembre2025 = (2025-1958)*12 +12\n", "x1 = [x for x in range(borneJanvier2010, 749)]\n", "x2 = [x for x in range(borneJanvier2010, borneDecembre2025+1)]\n", "\n", "y1 = udc['CO2'][-(749-borneJanvier2010):]\n", "y2 = a2*np.asarray(x2)**2 + b2*np.asarray(x2) + c2\n", "\n", "plt.plot(x1, y1)\n", "plt.plot(x2, y2)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "valeur moyenne pour l'annee 2020 : [CO2] = 412.69 ppm\n", "valeur moyenne pour l'annee 2021 : [CO2] = 415.09 ppm\n", "valeur moyenne pour l'annee 2022 : [CO2] = 417.52 ppm\n", "valeur moyenne pour l'annee 2023 : [CO2] = 419.97 ppm\n", "valeur moyenne pour l'annee 2024 : [CO2] = 422.45 ppm\n", "valeur moyenne pour l'annee 2025 : [CO2] = 424.96 ppm\n" ] } ], "source": [ "#Valeurs moyennes : on peut se contenter d'une intégration manuelle ici\n", "for x in range(2020, 2026):\n", " borneInf = (x-1958)*12\n", " borneSup = borneInf + 12\n", " Y2 = (a2*borneSup**3)/3 + (b2*borneSup**2)/2 + c2*borneSup\n", " Y1 = (a2*borneInf**3)/3 + (b2*borneInf**2)/2 + c2*borneInf\n", " meanValue = (Y2-Y1)/(borneSup-borneInf)\n", " print(\"valeur moyenne pour l'annee \", x, \" : [CO2] = \", format(meanValue, '0.2f'), \" ppm\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour finir, on peut vérifier nos résultats en comparant le fit avec les données réelles pour une année entièrement renseignée, 2019 par exemple." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "733 734 735 736 737 738 739 740 741 742 743 744 \n", "733 744\n", "[CO2] (moyenne réelle) : \t 411.28\n", "[CO2] (moyenne fit) : \t 410.31\n" ] } ], "source": [ "#calcul avec data\n", "newBorneInf = (2019-1958)*12+1 \n", "newBorneSup = newBorneInf + 11\n", "somme = 0\n", "for x in range(newBorneInf, newBorneSup+1):\n", " somme += udc['CO2'][x]\n", " print(x, end = ' ')\n", "print(\"\")\n", "newMean = somme/12\n", "\n", "#calcul avec fit\n", "newY2 = (a2*newBorneSup**3)/3 + (b2*newBorneSup**2)/2 + c2*newBorneSup\n", "newY1 = (a2*(newBorneInf-1)**3)/3 + (b2*(newBorneInf-1)**2)/2 + c2*(newBorneInf-1) \n", "# on ne peut pas utiliser la même borne pour une intégration ou une somme ici sans omettre une valeur, d'où le -1\n", "newMeanValue = (newY2-newY1)/12\n", "\n", "print(newBorneInf, newBorneSup)\n", "print(\"[CO2] (moyenne réelle) : \\t\", format(newMean, '0.2f'))\n", "print(\"[CO2] (moyenne fit) : \\t\", format(newMeanValue, '0.2f'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le résultat est acceptable, on peut calculer l'erreur relative." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.0023505910290639647" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "erreurRelative = (newMeanValue - newMean)/newMean\n", "erreurRelative" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "Le but de cette étude était de produire une analyse reproductible de l'évolution de la concentration en CO2 dans l'atmosphère. Les données de base permettent une étude de 1958 à 2020, modulo les données manquantes.\n", "En utilisant les librairies Pandas et Numpy pour traiter les données, nous avons constaté une croissance globale de la concentration en CO2 d'année en année, couplée à une oscillation de cette concentration avec des maxima autour de Mai et des minima autour de Septembre. Après quelques recherches, sur [cette page](https://en.wikipedia.org/wiki/Keeling_Curve), l'augmentation globale serait due à l'utilisation des énergies fossiles, et l'oscillation annuelle à l'effet de la photosynthèse de la flore terrestre.\n", "Nous avons ensuite appliqué 3 fonctions différentes pour trouver une courbe de tendance raisonnable pour notre jeu de données. Le choix s'est fait sur un polynôme de degré 2, qui est un bon compromis car il suit de près les données sans pour autant être trop complexe à traiter.\n", "Ce fit a ensuite permis d'extrapoler les valeurs moyennes des concentrations en CO2 pour les années 2020 à 2025. Une vérification sur l'année 2019 pour laquelle l'intégralité des données sont disponibles suggère une erreur relative inférieure au pourcent (0.2% en l'occurence), ce qui est semble acceptable.\n", "Pour compléter cette étude, un travail possible serait de caractériser l'oscillation de la concentration en CO2 dans l'année par une fonction sinusoidale par exemple." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }