# Sujet 1 : Concentration de CO2 dans l'atmosphère depuis 1958

En 1958, Charles David Keeling  a initié une mesure de la concentration de CO2 dans l'atmosphère à l'observatoire de Mauna Loa, Hawaii, États-Unis qui continue jusqu'à aujourd'hui. L'objectif initial était d'étudier la variation saisonnière, mais l'intérêt s'est déplacé plus tard vers l'étude de la tendance croissante dans le contexte du changement climatique. En honneur à Keeling, ce jeu de données est souvent appelé "Keeling Curve" (voir [la source sur wikepedia](https://en.wikipedia.org/wiki/Keeling_Curve) pour l'histoire et l'importance de ces données).

In [1]:
# common imports
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import isoweek
import os
import urllib.request
from datetime import date

today = date.today()

In [2]:
data_url = "https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/monthly/monthly_in_situ_co2_mlo.csv"

On récupère les données sur le [site de scrippsco2](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html)

In [3]:
data_file ="monthly_in_situ_co2_mlo.csv"

if not os.path.exists(data_file):
    urllib.request.urlretrieve(data_url, data_file)
    print('fichier téléchargé le ',today)

In [4]:
raw_data = pd.read_csv(data_file, skiprows=54)
raw_data

Unnamed: 0,Yr,Mn,Date,Date.1,CO2,seasonally,fit,seasonally.1,CO2.1,seasonally.2
0,,,,,,adjusted,,adjusted fit,filled,adjusted filled
1,,,Excel,,[ppm],[ppm],[ppm],[ppm],[ppm],[ppm]
2,1958,01,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
3,1958,02,21231,1958.1260,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
4,1958,03,21259,1958.2027,315.70,314.44,316.19,314.91,315.70,314.44
5,1958,04,21290,1958.2877,317.45,315.16,317.30,314.99,317.45,315.16
6,1958,05,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71
7,1958,06,21351,1958.4548,-99.99,-99.99,317.24,315.14,317.24,315.14
8,1958,07,21381,1958.5370,315.86,315.19,315.86,315.22,315.86,315.19
9,1958,08,21412,1958.6219,314.93,316.19,314.00,315.29,314.93,316.19


On a plusieurs petits problème avec la lecture de ces donnees brutes : les lignes 0 et 1 sont en fait des compléments des titres des colonnes. 

In [5]:
raw_data=pd.read_csv(data_file,skiprows=54,header=[0,1,2])
raw_data

Unnamed: 0_level_0,Yr,Mn,Date,Date,CO2,seasonally,fit,seasonally,CO2,seasonally
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,adjusted,Unnamed: 7_level_1,adjusted fit,filled,adjusted filled
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Excel,Unnamed: 4_level_2,[ppm],[ppm],[ppm],[ppm],[ppm],[ppm]
0,1958,1,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,1958,2,21231,1958.1260,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
2,1958,3,21259,1958.2027,315.70,314.44,316.19,314.91,315.70,314.44
3,1958,4,21290,1958.2877,317.45,315.16,317.30,314.99,317.45,315.16
4,1958,5,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71
5,1958,6,21351,1958.4548,-99.99,-99.99,317.24,315.14,317.24,315.14
6,1958,7,21381,1958.5370,315.86,315.19,315.86,315.22,315.86,315.19
7,1958,8,21412,1958.6219,314.93,316.19,314.00,315.29,314.93,316.19
8,1958,9,21443,1958.7068,313.21,316.08,312.46,315.35,313.21,316.08
9,1958,10,21473,1958.7890,-99.99,-99.99,312.44,315.40,312.44,315.40


On rassemble les 3 premières lignes pour changer les noms des colonnes

In [6]:
l_cols=list(raw_data.columns)
new_names_cols=[]
type(l_cols[0])
for item in l_cols: 
    new_string=''
    for string in item:
        new_string=new_string+' '+string
    item=" ".join(new_string.split())
    new_names_cols.append(item)

newh_data=raw_data
newh_data.columns=new_names_cols
newh_data

Unnamed: 0,Yr,Mn,Date Excel,Date,CO2 [ppm],seasonally adjusted [ppm],fit [ppm],seasonally adjusted fit [ppm],CO2 filled [ppm],seasonally adjusted filled [ppm]
0,1958,1,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,1958,2,21231,1958.1260,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
2,1958,3,21259,1958.2027,315.70,314.44,316.19,314.91,315.70,314.44
3,1958,4,21290,1958.2877,317.45,315.16,317.30,314.99,317.45,315.16
4,1958,5,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71
5,1958,6,21351,1958.4548,-99.99,-99.99,317.24,315.14,317.24,315.14
6,1958,7,21381,1958.5370,315.86,315.19,315.86,315.22,315.86,315.19
7,1958,8,21412,1958.6219,314.93,316.19,314.00,315.29,314.93,316.19
8,1958,9,21443,1958.7068,313.21,316.08,312.46,315.35,313.21,316.08
9,1958,10,21473,1958.7890,-99.99,-99.99,312.44,315.40,312.44,315.40


Il y a également des données manquantes dont la valeur est mise à -99.99. On regarde ces lignes. On remarque que certaines lignes on également la dernière colonne sans valeurs. On va commencer par éliminer celles-ci

In [7]:
newh_data[newh_data['CO2 [ppm]']<0]

Unnamed: 0,Yr,Mn,Date Excel,Date,CO2 [ppm],seasonally adjusted [ppm],fit [ppm],seasonally adjusted fit [ppm],CO2 filled [ppm],seasonally adjusted filled [ppm]
0,1958,1,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,1958,2,21231,1958.126,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
5,1958,6,21351,1958.4548,-99.99,-99.99,317.24,315.14,317.24,315.14
9,1958,10,21473,1958.789,-99.99,-99.99,312.44,315.4,312.44,315.4
73,1964,2,23422,1964.1257,-99.99,-99.99,320.01,319.36,320.01,319.36
74,1964,3,23451,1964.2049,-99.99,-99.99,320.74,319.41,320.74,319.41
75,1964,4,23482,1964.2896,-99.99,-99.99,321.83,319.45,321.83,319.45
745,2020,2,43876,2020.1257,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
746,2020,3,43905,2020.2049,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
747,2020,4,43936,2020.2896,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99


In [8]:
#data=newh_data[newh_data['seasonally adjusted filled [ppm]']>0]
data=newh_data.drop(newh_data[newh_data['seasonally adjusted filled [ppm]']<0].index)
print(len(data))

743


In [11]:
data['period'] = [pd.Period(freq='M',year=year,month=month) for (year,month) in zip(data['Yr'], data['Mn'])]
data

Unnamed: 0,Yr,Mn,Date Excel,Date,CO2 [ppm],seasonally adjusted [ppm],fit [ppm],seasonally adjusted fit [ppm],CO2 filled [ppm],seasonally adjusted filled [ppm],period
2,1958,3,21259,1958.2027,315.70,314.44,316.19,314.91,315.70,314.44,1958-03
3,1958,4,21290,1958.2877,317.45,315.16,317.30,314.99,317.45,315.16,1958-04
4,1958,5,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71,1958-05
5,1958,6,21351,1958.4548,-99.99,-99.99,317.24,315.14,317.24,315.14,1958-06
6,1958,7,21381,1958.5370,315.86,315.19,315.86,315.22,315.86,315.19,1958-07
7,1958,8,21412,1958.6219,314.93,316.19,314.00,315.29,314.93,316.19,1958-08
8,1958,9,21443,1958.7068,313.21,316.08,312.46,315.35,313.21,316.08,1958-09
9,1958,10,21473,1958.7890,-99.99,-99.99,312.44,315.40,312.44,315.40,1958-10
10,1958,11,21504,1958.8740,313.33,315.20,313.62,315.46,313.33,315.20,1958-11
11,1958,12,21534,1958.9562,314.67,315.43,314.77,315.51,314.67,315.43,1958-12


In [13]:
data.set_index('period')
periods = data.index
print(periods)
data
#for p1, p2 in zip(periods[:-1], periods[1:]):
    #delta = p2.to_timestamp() - p1.end_time
    #if delta > pd.Timedelta('1s'):
    #    print(p1, p2)

Int64Index([  2,   3,   4,   5,   6,   7,   8,   9,  10,  11,
            ...
            735, 736, 737, 738, 739, 740, 741, 742, 743, 744],
           dtype='int64', length=743)


Unnamed: 0,Yr,Mn,Date Excel,Date,CO2 [ppm],seasonally adjusted [ppm],fit [ppm],seasonally adjusted fit [ppm],CO2 filled [ppm],seasonally adjusted filled [ppm],period
2,1958,3,21259,1958.2027,315.70,314.44,316.19,314.91,315.70,314.44,1958-03
3,1958,4,21290,1958.2877,317.45,315.16,317.30,314.99,317.45,315.16,1958-04
4,1958,5,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71,1958-05
5,1958,6,21351,1958.4548,-99.99,-99.99,317.24,315.14,317.24,315.14,1958-06
6,1958,7,21381,1958.5370,315.86,315.19,315.86,315.22,315.86,315.19,1958-07
7,1958,8,21412,1958.6219,314.93,316.19,314.00,315.29,314.93,316.19,1958-08
8,1958,9,21443,1958.7068,313.21,316.08,312.46,315.35,313.21,316.08,1958-09
9,1958,10,21473,1958.7890,-99.99,-99.99,312.44,315.40,312.44,315.40,1958-10
10,1958,11,21504,1958.8740,313.33,315.20,313.62,315.46,313.33,315.20,1958-11
11,1958,12,21534,1958.9562,314.67,315.43,314.77,315.51,314.67,315.43,1958-12
