diff --git a/module3/exercice3/ConcentrationCO2.Rmd b/module3/exercice3/ConcentrationCO2.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..2ea30baee367634869bf2296639f7a40ba466d50 --- /dev/null +++ b/module3/exercice3/ConcentrationCO2.Rmd @@ -0,0 +1,133 @@ +--- +title: "Concentration de CO2 dans l'atmosphère depuis 1958" +author: "Hélène Raynal" +date: "21 avril 2020" +output: html_document +--- + + +### Sujet + +En 1958, Charles David Keeling a initié une mesure de la concentration de CO2 dans l'atmosphère à l'observatoire de Mauna Loa, Hawaii, États-Unis qui continue jusqu'à aujourd'hui. L'objectif initial était d'étudier la variation saisonnière, mais l'intérêt s'est déplacé plus tard vers l'étude de la tendance croissante dans le contexte du changement climatique. En honneur à Keeling, ce jeu de données est souvent appelé "Keeling Curve" (voir (https://en.wikipedia.org/wiki/Keeling_Curve) pour l'histoire et l'importance de ces données). + +Les données sont disponibles sur le site Web de l'institut Scripps. Utilisez le fichier avec les observations hebdomadaires. Attention, ce fichier est mis à jour régulièrement avec de nouvelles observations. Notez donc bien la date du téléchargement, et gardez une copie locale de la version précise que vous analysez. Faites aussi attention aux données manquantes. + + +### Pré-requis +Traitement de suites chronologiques + +Quelques références: + +- [caschrono: Séries Temporelles Avec R] (https://cran.r-project.org/web/packages/caschrono/) - Yves Aragon - Université Toulouse Capitole - 28 janvier 2019 + + +### Create dataframe and load R libraries required for the different statistical treatments + +The data file below contains 10 columns. + +- Columns 1-4 give the dates in several redundant formats. +- Column 5 below gives monthly Mauna Loa CO2 concentrations in micro-mol CO2 per mole (ppm), reported on the 2008A SIO manometric mole fraction scale. This is thestandard version of the data most often sought. The monthly values have been adjusted to 24:00 hours on the 15th of each month. +- Column 6 gives the same data after a seasonal adjustment to remove the quasi-regular seasonal cycle. The adjustment involves subtracting from the data a 4-harmonic fit with a linear gain factor. +- Column 7 is a smoothed version of the data generated from a stiff cubic spline function plus 4-harmonic functions with linear gain. +- Column 8 is the same smoothed version with the seasonal cycle removed. +- Column 9 is identical to Column 5 except that the missing values from Column 5 have been filled with values from Column 7. +- Column 10 is identical to Column 6 except missing values have been filled with values from Column 8. + +Missing values are denoted by -99.99 + + CO2 concentrations are measured on the '08A' calibration scale + +```{r } +library(tidyverse) +library(forecast) +library(lubridate) +library(car) +library(scales) +library(patchwork) +library(kableExtra) + +dataCO2 <- read.csv("monthly_in_situ_co2_mlo.csv", sep="," ,skip = 57) +colnames(dataCO2) <- c("Year", "Month","Date1", "Date2", "ObsCO2", "SeasAdjCO2","SplineAdjCO2", "SplineAdjCO2Trend", "ObsCO2Comp", "SeasAdjCO2Comp") +summary(dataCO2) + +dataCO2$Date <- ymd(paste0(dataCO2$Year, " ", dataCO2$Month, " ", "15")) + +``` + + +** Remplacement dans la série des valeurs observées, des valeurs manquantes -99.99 par celles qui sont interpolées +** on enlève ensuite les observations manquantes +```{r } + + + +dataCO2 <- dataCO2[dataCO2$ObsCO2Comp != "-99.99", ] + +``` + +** Create a column Date with format YYYY MM DD +```{r } + + +dataCO2$Date <- ymd(paste0(dataCO2$Year, "-", dataCO2$Month, "-", "15")) + +``` + + +### Représentation des résultats + +```{r } +ggplot(dataCO2,aes(Date, dataCO2$ObsCO2Comp)) + + geom_line(color='orange') + + xlab("Year, Month") + + scale_x_date(date_labels = "%Y-%m", date_breaks = "5 year") + + theme(axis.text.x = element_text(face = "bold", color = "#993333", + size = 12, angle = 45, hjust = 1)) + + ylab("CO2 Concentration (ppm)") + + scale_y_continuous() + + theme(axis.text.y = element_text(face = "bold", color = "#993333", + size = 10, hjust = 1),axis.title.y = element_text(size = 10)) + + ggtitle("Graphique 1") + +``` + +```{r } + + library(viridis) + +dataCO2_by_year <- dataCO2 %>% group_by("Year") +ggplot(dataCO2_by_year, aes(dataCO2_by_year$Month,dataCO2_by_year$ObsCO2Comp )) + + geom_line(aes( group = dataCO2_by_year$Year , colour=dataCO2_by_year$Year)) + + xlab("Month")+ + ylab("CO2 Concentration (ppm)") + + ggtitle("Graphique saisonnier") +``` + + + +### Modélisation + +** Doc + +(https://github.com/Peymankor/Data-Science_Portfolio/blob/master/Time%20Series%20Analysis-Historical%20Co2/medium_post.Rmd) + +Série n'est pas stationnaire comme le montre le graphique + +Série montre une saisonnalité voir graphique 2 + +Now, knowing the the non-statinary and seasonality of the data, it suggest to use theseasons differencing to model the data. To answer, + + +* How is Autocorelation function and Partial Auto corellation? + +Here is the plot of ACF and PACF from the *forecast* package: + +```{r} +Co2_train <- ts(dataCO2_by_year$ObsCO2Comp, start = c(1958,3), frequency = 12) +Co2_train %>% ggtsdisplay() +``` + + +```{r} +Co2_train %>% diff(lag=12) %>% diff() %>% ggtsdisplay() +``` \ No newline at end of file diff --git a/module3/exercice3/ConcentrationCO2.html b/module3/exercice3/ConcentrationCO2.html new file mode 100644 index 0000000000000000000000000000000000000000..d221aef89b73f10b3783a005c16761cdf6f83db6 --- /dev/null +++ b/module3/exercice3/ConcentrationCO2.html @@ -0,0 +1,584 @@ + + + + +
+ + + + + + + + + +En 1958, Charles David Keeling a initié une mesure de la concentration de CO2 dans l’atmosphère à l’observatoire de Mauna Loa, Hawaii, États-Unis qui continue jusqu’à aujourd’hui. L’objectif initial était d’étudier la variation saisonnière, mais l’intérêt s’est déplacé plus tard vers l’étude de la tendance croissante dans le contexte du changement climatique. En honneur à Keeling, ce jeu de données est souvent appelé “Keeling Curve” (voir (https://en.wikipedia.org/wiki/Keeling_Curve) pour l’histoire et l’importance de ces données).
+Les données sont disponibles sur le site Web de l’institut Scripps. Utilisez le fichier avec les observations hebdomadaires. Attention, ce fichier est mis à jour régulièrement avec de nouvelles observations. Notez donc bien la date du téléchargement, et gardez une copie locale de la version précise que vous analysez. Faites aussi attention aux données manquantes.
+Traitement de suites chronologiques
+Quelques références:
+The data file below contains 10 columns.
+Missing values are denoted by -99.99
+CO2 concentrations are measured on the ‘08A’ calibration scale
+library(tidyverse)
+## Warning: package 'tidyverse' was built under R version 3.5.3
+## -- Attaching packages -------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
+## v ggplot2 3.1.0 v purrr 0.3.2
+## v tibble 2.1.1 v dplyr 0.8.0.1
+## v tidyr 0.8.3 v stringr 1.4.0
+## v readr 1.3.1 v forcats 0.4.0
+## Warning: package 'tibble' was built under R version 3.5.3
+## Warning: package 'tidyr' was built under R version 3.5.3
+## Warning: package 'readr' was built under R version 3.5.3
+## Warning: package 'purrr' was built under R version 3.5.3
+## Warning: package 'dplyr' was built under R version 3.5.3
+## Warning: package 'stringr' was built under R version 3.5.3
+## Warning: package 'forcats' was built under R version 3.5.3
+## -- Conflicts ----------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
+## x dplyr::filter() masks stats::filter()
+## x dplyr::lag() masks stats::lag()
+library(forecast)
+## Warning: package 'forecast' was built under R version 3.5.3
+library(lubridate)
+## Warning: package 'lubridate' was built under R version 3.5.3
+##
+## Attaching package: 'lubridate'
+## The following object is masked from 'package:base':
+##
+## date
+library(car)
+## Warning: package 'car' was built under R version 3.5.3
+## Loading required package: carData
+## Warning: package 'carData' was built under R version 3.5.2
+##
+## Attaching package: 'car'
+## The following object is masked from 'package:dplyr':
+##
+## recode
+## The following object is masked from 'package:purrr':
+##
+## some
+library(scales)
+##
+## Attaching package: 'scales'
+## The following object is masked from 'package:purrr':
+##
+## discard
+## The following object is masked from 'package:readr':
+##
+## col_factor
+library(patchwork)
+## Warning: package 'patchwork' was built under R version 3.5.3
+library(kableExtra)
+## Warning: package 'kableExtra' was built under R version 3.5.3
+##
+## Attaching package: 'kableExtra'
+## The following object is masked from 'package:dplyr':
+##
+## group_rows
+dataCO2 <- read.csv("monthly_in_situ_co2_mlo.csv", sep="," ,skip = 57)
+colnames(dataCO2) <- c("Year", "Month","Date1", "Date2", "ObsCO2", "SeasAdjCO2","SplineAdjCO2", "SplineAdjCO2Trend", "ObsCO2Comp", "SeasAdjCO2Comp")
+summary(dataCO2)
+## Year Month Date1 Date2
+## Min. :1958 Min. : 1.000 Min. :21231 Min. :1958
+## 1st Qu.:1973 1st Qu.: 4.000 1st Qu.:26968 1st Qu.:1974
+## Median :1989 Median : 7.000 Median :32704 Median :1990
+## Mean :1989 Mean : 6.507 Mean :32705 Mean :1990
+## 3rd Qu.:2005 3rd Qu.: 9.500 3rd Qu.:38442 3rd Qu.:2005
+## Max. :2020 Max. :12.000 Max. :44180 Max. :2021
+## ObsCO2 SeasAdjCO2 SplineAdjCO2 SplineAdjCO2Trend
+## Min. :-99.99 Min. :-99.99 Min. :-99.99 Min. :-99.99
+## 1st Qu.:328.40 1st Qu.:328.70 1st Qu.:328.46 1st Qu.:328.82
+## Median :351.34 Median :352.13 Median :351.33 Median :352.03
+## Mean :346.18 Mean :346.18 Mean :348.95 Mean :348.95
+## 3rd Qu.:377.55 3rd Qu.:377.35 3rd Qu.:377.69 3rd Qu.:377.37
+## Max. :414.83 Max. :413.33 Max. :414.94 Max. :413.35
+## ObsCO2Comp SeasAdjCO2Comp
+## Min. :-99.99 Min. :-99.99
+## 1st Qu.:328.40 1st Qu.:328.70
+## Median :351.34 Median :352.13
+## Mean :348.96 Mean :348.95
+## 3rd Qu.:377.55 3rd Qu.:377.35
+## Max. :414.83 Max. :413.33
+dataCO2$Date <- ymd(paste0(dataCO2$Year, " ", dataCO2$Month, " ", "15"))
+** Remplacement dans la série des valeurs observées, des valeurs manquantes -99.99 par celles qui sont interpolées ** on enlève ensuite les observations manquantes
+dataCO2 <- dataCO2[dataCO2$ObsCO2Comp != "-99.99", ]
+** Create a column Date with format YYYY MM DD
+dataCO2$Date <- ymd(paste0(dataCO2$Year, "-", dataCO2$Month, "-", "15"))
+ggplot(dataCO2,aes(Date, dataCO2$ObsCO2Comp)) +
+ geom_line(color='orange') +
+ xlab("Year, Month") +
+ scale_x_date(date_labels = "%Y-%m", date_breaks = "5 year") +
+ theme(axis.text.x = element_text(face = "bold", color = "#993333",
+ size = 12, angle = 45, hjust = 1)) +
+ ylab("CO2 Concentration (ppm)") +
+ scale_y_continuous() +
+ theme(axis.text.y = element_text(face = "bold", color = "#993333",
+ size = 10, hjust = 1),axis.title.y = element_text(size = 10)) +
+ ggtitle("Graphique 1")
+ library(viridis)
+## Loading required package: viridisLite
+##
+## Attaching package: 'viridis'
+## The following object is masked from 'package:scales':
+##
+## viridis_pal
+dataCO2_by_year <- dataCO2 %>% group_by("Year")
+ggplot(dataCO2_by_year, aes(dataCO2_by_year$Month,dataCO2_by_year$ObsCO2Comp )) +
+ geom_line(aes( group = dataCO2_by_year$Year , colour=dataCO2_by_year$Year)) +
+ xlab("Month")+
+ ylab("CO2 Concentration (ppm)") +
+ ggtitle("Graphique saisonnier")
+Série n’est pas stationnaire comme le montre le graphique
+Série montre une saisonnalité
+