"En 1958, Charles David Keeling a initié une mesure de la concentration de CO2 dans l'atmosphère à l'observatoire de Mauna Loa, Hawaii, États-Unis qui continue jusqu'à aujourd'hui.\n",
"Les données sont disponibles sur le [site Web de l'institut Scripps](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le format des données est explicité dans le fichier CSV :\n",
"\n",
"\"The data file below contains 2 columns indicaing the date and CO2 \"\n",
"\" concentrations in micro-mol CO2 per mole (ppm), reported on the 2008A \"\n",
"\" SIO manometric mole fraction scale. These weekly values have been \"\n",
"\" adjusted to 12:00 hours at middle day of each weekly period as \"\n",
"\" indicated by the date in the first column. \"\n",
"\n",
"Les 43 premières lignes sont des commentaires que nous ignorons en précisant skiprows=43."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Vérifions qu'une copie locale des données existe dans le répertoire de travail sinon nous la téléchargeons.\n",
"\n",
"**NB:** La version de la base de données hebdomadaires utilisée dans la création de ce document computationnel a été téléchargé le 12/06/2020.\n",
"\n",
"Nous ajoutons un nom aux colonnes pour les identifier plus facilement par la suite"
"Y a-t-il des points manquants dans ce jeux de données ?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date</th>\n",
" <th>CO2_concentration</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [date, CO2_concentration]\n",
"Index: []"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_data[raw_data.isnull().any(axis=1)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"À la date de création de ce document il n'y a pas de données manquante. Cependant on introduisons une procédure pour supprimer les lignes avec de telles données manquantes susceptibles d'être introduite dans la base de données par la suite."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date</th>\n",
" <th>CO2_concentration</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1958-03-29</td>\n",
" <td>316.19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1958-04-05</td>\n",
" <td>317.31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1958-04-12</td>\n",
" <td>317.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1958-04-19</td>\n",
" <td>317.58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1958-04-26</td>\n",
" <td>316.48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1958-05-03</td>\n",
" <td>316.95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1958-05-17</td>\n",
" <td>317.56</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1958-05-24</td>\n",
" <td>317.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1958-07-05</td>\n",
" <td>315.85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1958-07-12</td>\n",
" <td>315.85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1958-07-19</td>\n",
" <td>315.46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>1958-07-26</td>\n",
" <td>315.59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>1958-08-02</td>\n",
" <td>315.64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>1958-08-09</td>\n",
" <td>315.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>1958-08-16</td>\n",
" <td>315.09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>1958-08-30</td>\n",
" <td>314.14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>1958-09-06</td>\n",
" <td>313.54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>1958-11-08</td>\n",
" <td>313.05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>1958-11-15</td>\n",
" <td>313.26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>1958-11-22</td>\n",
" <td>313.57</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>1958-11-29</td>\n",
" <td>314.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>1958-12-06</td>\n",
" <td>314.56</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>1958-12-13</td>\n",
" <td>314.41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>1958-12-20</td>\n",
" <td>314.77</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>1958-12-27</td>\n",
" <td>315.21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>1959-01-03</td>\n",
" <td>315.24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>1959-01-10</td>\n",
" <td>315.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>1959-01-17</td>\n",
" <td>315.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>1959-01-24</td>\n",
" <td>315.86</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>1959-01-31</td>\n",
" <td>315.42</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3143</th>\n",
" <td>2019-11-02</td>\n",
" <td>409.86</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3144</th>\n",
" <td>2019-11-09</td>\n",
" <td>410.15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3145</th>\n",
" <td>2019-11-16</td>\n",
" <td>410.22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3146</th>\n",
" <td>2019-11-23</td>\n",
" <td>410.48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3147</th>\n",
" <td>2019-11-30</td>\n",
" <td>410.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3148</th>\n",
" <td>2019-12-07</td>\n",
" <td>411.27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3149</th>\n",
" <td>2019-12-14</td>\n",
" <td>411.67</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3150</th>\n",
" <td>2019-12-21</td>\n",
" <td>412.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3151</th>\n",
" <td>2019-12-28</td>\n",
" <td>412.59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3152</th>\n",
" <td>2020-01-04</td>\n",
" <td>413.19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3153</th>\n",
" <td>2020-01-11</td>\n",
" <td>413.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3154</th>\n",
" <td>2020-01-25</td>\n",
" <td>413.36</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3155</th>\n",
" <td>2020-02-01</td>\n",
" <td>413.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3156</th>\n",
" <td>2020-02-08</td>\n",
" <td>414.83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3157</th>\n",
" <td>2020-02-15</td>\n",
" <td>413.81</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3158</th>\n",
" <td>2020-02-22</td>\n",
" <td>414.17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3159</th>\n",
" <td>2020-02-29</td>\n",
" <td>413.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3160</th>\n",
" <td>2020-03-07</td>\n",
" <td>414.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3161</th>\n",
" <td>2020-03-14</td>\n",
" <td>414.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3162</th>\n",
" <td>2020-03-21</td>\n",
" <td>414.62</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3163</th>\n",
" <td>2020-03-28</td>\n",
" <td>415.57</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3164</th>\n",
" <td>2020-04-04</td>\n",
" <td>415.61</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3165</th>\n",
" <td>2020-04-11</td>\n",
" <td>416.47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3166</th>\n",
" <td>2020-04-18</td>\n",
" <td>416.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3167</th>\n",
" <td>2020-04-25</td>\n",
" <td>415.86</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3168</th>\n",
" <td>2020-05-02</td>\n",
" <td>417.20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3169</th>\n",
" <td>2020-05-09</td>\n",
" <td>416.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3170</th>\n",
" <td>2020-05-16</td>\n",
" <td>416.54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3171</th>\n",
" <td>2020-05-23</td>\n",
" <td>417.49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3172</th>\n",
" <td>2020-05-30</td>\n",
" <td>417.19</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3173 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" date CO2_concentration\n",
"0 1958-03-29 316.19\n",
"1 1958-04-05 317.31\n",
"2 1958-04-12 317.69\n",
"3 1958-04-19 317.58\n",
"4 1958-04-26 316.48\n",
"5 1958-05-03 316.95\n",
"6 1958-05-17 317.56\n",
"7 1958-05-24 317.99\n",
"8 1958-07-05 315.85\n",
"9 1958-07-12 315.85\n",
"10 1958-07-19 315.46\n",
"11 1958-07-26 315.59\n",
"12 1958-08-02 315.64\n",
"13 1958-08-09 315.10\n",
"14 1958-08-16 315.09\n",
"15 1958-08-30 314.14\n",
"16 1958-09-06 313.54\n",
"17 1958-11-08 313.05\n",
"18 1958-11-15 313.26\n",
"19 1958-11-22 313.57\n",
"20 1958-11-29 314.01\n",
"21 1958-12-06 314.56\n",
"22 1958-12-13 314.41\n",
"23 1958-12-20 314.77\n",
"24 1958-12-27 315.21\n",
"25 1959-01-03 315.24\n",
"26 1959-01-10 315.50\n",
"27 1959-01-17 315.69\n",
"28 1959-01-24 315.86\n",
"29 1959-01-31 315.42\n",
"... ... ...\n",
"3143 2019-11-02 409.86\n",
"3144 2019-11-09 410.15\n",
"3145 2019-11-16 410.22\n",
"3146 2019-11-23 410.48\n",
"3147 2019-11-30 410.92\n",
"3148 2019-12-07 411.27\n",
"3149 2019-12-14 411.67\n",
"3150 2019-12-21 412.30\n",
"3151 2019-12-28 412.59\n",
"3152 2020-01-04 413.19\n",
"3153 2020-01-11 413.39\n",
"3154 2020-01-25 413.36\n",
"3155 2020-02-01 413.99\n",
"3156 2020-02-08 414.83\n",
"3157 2020-02-15 413.81\n",
"3158 2020-02-22 414.17\n",
"3159 2020-02-29 413.89\n",
"3160 2020-03-07 414.00\n",
"3161 2020-03-14 414.30\n",
"3162 2020-03-21 414.62\n",
"3163 2020-03-28 415.57\n",
"3164 2020-04-04 415.61\n",
"3165 2020-04-11 416.47\n",
"3166 2020-04-18 416.60\n",
"3167 2020-04-25 415.86\n",
"3168 2020-05-02 417.20\n",
"3169 2020-05-09 416.99\n",
"3170 2020-05-16 416.54\n",
"3171 2020-05-23 417.49\n",
"3172 2020-05-30 417.19\n",
"\n",
"[3173 rows x 2 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = raw_data.dropna().copy()\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le format de la date est conventionnelle et compris par la bibliothèque pandas. Nous pouvons utiliser les données telles quelles pour tracer un premier aperçu des données brutes."