# Analyse de la concentration de CO2 dans l'atmosphère depuis 1958

In [1]:
 %matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import isoweek
import requests
import os

## Préparation des données

 Les données de concentration du CO2 sont disponibles du site [Scripps CO2 Program](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html). Nous téléchargeons les données le 07 mai 2022 à 09:32. 

In [2]:
csv_name = "monthly_in_situ_co2_mlo.csv"
data_url = "https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/monthly/" + csv_name
file_exists = os.path.exists(csv_name)
if not file_exists:
    print('file doesnt exist: dowloading')
    r = requests.get(data_url)
    data = r.text
    with open(csv_name, "w", encoding='UTF-8') as text_file:
        text_file.write(data)

with open(csv_name, "rb") as text_file:
    print(text_file.readlines())

[b'"-------------------------------------------------------------------------------------------"\n', b'" Atmospheric CO2 concentrations (ppm) derived from in situ air measurements                "\n', b'" at Mauna Loa, Observatory, Hawaii: Latitude 19.5\xc3\x82\xc2\xb0N Longitude 155.6\xc3\x82\xc2\xb0W Elevation 3397m      "\n', b'"                                                                                           "\n', b'" Source: R. F. Keeling, S. J. Walker, S. C. Piper and A. F. Bollenbacher                   "\n', b'" Scripps CO2 Program ( http://scrippsco2.ucsd.edu )                                        "\n', b'" Scripps Institution of Oceanography (SIO)                                                 "\n', b'" University of California                                                                  "\n', b'" La Jolla, California USA 92093-0244                                                       "\n', b'"                                                                  

Nous chargeons le jeu de données dans une DataFrame à l'aide de Pandas.

In [3]:
raw_data = pd.read_csv(csv_name, encoding = 'UTF-8', comment='"')

On retire les deux premières lignes qui correspondent à des métadonnées

In [4]:
data = raw_data[2:].reset_index()

In [5]:
data

Unnamed: 0,index,Yr,Mn,Date,Date.1,CO2,seasonally,fit,seasonally.1,CO2.1,seasonally.2
0,2,1958,01,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,3,1958,02,21231,1958.1260,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
2,4,1958,03,21259,1958.2027,315.71,314.44,316.19,314.91,315.71,314.44
3,5,1958,04,21290,1958.2877,317.45,315.16,317.29,314.99,317.45,315.16
4,6,1958,05,21320,1958.3699,317.51,314.70,317.87,315.06,317.51,314.70
5,7,1958,06,21351,1958.4548,-99.99,-99.99,317.25,315.14,317.25,315.14
6,8,1958,07,21381,1958.5370,315.86,315.20,315.85,315.22,315.86,315.20
7,9,1958,08,21412,1958.6219,314.93,316.21,313.97,315.29,314.93,316.21
8,10,1958,09,21443,1958.7068,313.21,316.10,312.44,315.35,313.21,316.10
9,11,1958,10,21473,1958.7890,-99.99,-99.99,312.43,315.40,312.43,315.40


On peut déjà voir que les données manquantes sont représentées par "-99.99". On remplace d'abord celles-ci par np.Nan

In [13]:
data.iloc[779].values

array([781, '2022', '12', '44910', '2022.9562', '-99.99', '-99.99',
       '-99.99', '-99.99', '-99.99', '-99.99'], dtype=object)

Il y a des espaces à retirer: on utilise une fonctions.

In [14]:
data = data.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

On regarde à nouveau la même ligne.

In [15]:
data.iloc[779].values

array([781, '2022', '12', '44910', '2022.9562', '-99.99', '-99.99',
       '-99.99', '-99.99', '-99.99', '-99.99'], dtype=object)

Maintenant, on replace '-99.99' par np.nan.

In [16]:
import numpy as np
data = data.replace('-99.99', np.nan)

In [17]:
data.iloc[779].values

array([781, '2022', '12', '44910', '2022.9562', nan, nan, nan, nan, nan,
       nan], dtype=object)

In [18]:
data.iloc[760].values

array([762, '2021', '05', '44331', '2021.3699', '418.95', '415.55',
       '419.23', '415.82', '418.95', '415.55'], dtype=object)

Desormais, nous pouvons convertir les colonnes avec le type requis.