Sujet

En 1958, Charles David Keeling a initié une mesure de la concentration de CO2 dans l’atmosphère à l’observatoire de Mauna Loa, Hawaii, États-Unis qui continue jusqu’à aujourd’hui. L’objectif initial était d’étudier la variation saisonnière, mais l’intérêt s’est déplacé plus tard vers l’étude de la tendance croissante dans le contexte du changement climatique. En honneur à Keeling, ce jeu de données est souvent appelé “Keeling Curve” (voir (https://en.wikipedia.org/wiki/Keeling_Curve) pour l’histoire et l’importance de ces données).

Les données sont disponibles sur le site Web de l’institut Scripps. Utilisez le fichier avec les observations hebdomadaires. Attention, ce fichier est mis à jour régulièrement avec de nouvelles observations. Notez donc bien la date du téléchargement, et gardez une copie locale de la version précise que vous analysez. Faites aussi attention aux données manquantes.

Pré-requis

Traitement de suites chronologiques

Quelques références:

Create dataframe and load R libraries required for the different statistical treatments

The data file below contains 10 columns.

Missing values are denoted by -99.99

CO2 concentrations are measured on the ‘08A’ calibration scale

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages -------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0       v purrr   0.3.2  
## v tibble  2.1.1       v dplyr   0.8.0.1
## v tidyr   0.8.3       v stringr 1.4.0  
## v readr   1.3.1       v forcats 0.4.0
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'tidyr' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.3
## Warning: package 'purrr' was built under R version 3.5.3
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'stringr' was built under R version 3.5.3
## Warning: package 'forcats' was built under R version 3.5.3
## -- Conflicts ----------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(forecast)
## Warning: package 'forecast' was built under R version 3.5.3
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.5.3
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(car)
## Warning: package 'car' was built under R version 3.5.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.5.2
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(patchwork)
## Warning: package 'patchwork' was built under R version 3.5.3
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.5.3
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
dataCO2 <- read.csv("monthly_in_situ_co2_mlo.csv", sep="," ,skip = 57)
colnames(dataCO2) <- c("Year", "Month","Date1", "Date2", "ObsCO2", "SeasAdjCO2","SplineAdjCO2", "SplineAdjCO2Trend", "ObsCO2Comp", "SeasAdjCO2Comp")
summary(dataCO2)
##       Year          Month            Date1           Date2     
##  Min.   :1958   Min.   : 1.000   Min.   :21231   Min.   :1958  
##  1st Qu.:1973   1st Qu.: 4.000   1st Qu.:26968   1st Qu.:1974  
##  Median :1989   Median : 7.000   Median :32704   Median :1990  
##  Mean   :1989   Mean   : 6.507   Mean   :32705   Mean   :1990  
##  3rd Qu.:2005   3rd Qu.: 9.500   3rd Qu.:38442   3rd Qu.:2005  
##  Max.   :2020   Max.   :12.000   Max.   :44180   Max.   :2021  
##      ObsCO2         SeasAdjCO2      SplineAdjCO2    SplineAdjCO2Trend
##  Min.   :-99.99   Min.   :-99.99   Min.   :-99.99   Min.   :-99.99   
##  1st Qu.:328.40   1st Qu.:328.70   1st Qu.:328.46   1st Qu.:328.82   
##  Median :351.34   Median :352.13   Median :351.33   Median :352.03   
##  Mean   :346.18   Mean   :346.18   Mean   :348.95   Mean   :348.95   
##  3rd Qu.:377.55   3rd Qu.:377.35   3rd Qu.:377.69   3rd Qu.:377.37   
##  Max.   :414.83   Max.   :413.33   Max.   :414.94   Max.   :413.35   
##    ObsCO2Comp     SeasAdjCO2Comp  
##  Min.   :-99.99   Min.   :-99.99  
##  1st Qu.:328.40   1st Qu.:328.70  
##  Median :351.34   Median :352.13  
##  Mean   :348.96   Mean   :348.95  
##  3rd Qu.:377.55   3rd Qu.:377.35  
##  Max.   :414.83   Max.   :413.33
dataCO2$Date <- ymd(paste0(dataCO2$Year, " ", dataCO2$Month, " ", "15"))

** Remplacement dans la série des valeurs observées, des valeurs manquantes -99.99 par celles qui sont interpolées ** on enlève ensuite les observations manquantes

dataCO2 <- dataCO2[dataCO2$ObsCO2Comp != "-99.99", ]

** Create a column Date with format YYYY MM DD

dataCO2$Date <- ymd(paste0(dataCO2$Year, "-", dataCO2$Month, "-", "15"))

Représentation des résultats

ggplot(dataCO2,aes(Date, dataCO2$ObsCO2Comp)) +
    geom_line(color='orange') +
    xlab("Year, Month") +
    scale_x_date(date_labels = "%Y-%m", date_breaks = "5 year") +
    theme(axis.text.x = element_text(face = "bold", color = "#993333", 
                           size = 12, angle = 45, hjust = 1)) +
    ylab("CO2 Concentration (ppm)") +
    scale_y_continuous() +
    theme(axis.text.y = element_text(face = "bold", color = "#993333", 
                           size = 10, hjust = 1),axis.title.y = element_text(size = 10)) +
  ggtitle("Graphique 1")

 library(viridis) 
## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
## 
##     viridis_pal
dataCO2_by_year <- dataCO2 %>% group_by("Year")
ggplot(dataCO2_by_year, aes(dataCO2_by_year$Month,dataCO2_by_year$ObsCO2Comp )) +
         geom_line(aes( group = dataCO2_by_year$Year , colour=dataCO2_by_year$Year)) +
         xlab("Month")+
         ylab("CO2 Concentration (ppm)") +
         ggtitle("Graphique saisonnier")

Modélisation

Série n’est pas stationnaire comme le montre le graphique

Série montre une saisonnalité