--- title: "CO2 Keeling Curve and Historical Ice Core Data" author: "David Elser" date: "05-03-2021" output: pdf_document --- #### Please note that I am not a Computer Scientist and my programming skills are very limited. ### Data preprocessing fetch the Data file and check if file already exists. The data is available from the Web site of the Scripps Institute Soure script for the Keeling Curve available at : https://github.com/DemirhanOzelTrojan/Keeling_Predict/blob/master/main.R ` install.packages('forecast', dependencies = TRUE) install.packages("imputeTS") install.packages("tidyverse") ` ### Load librarys ```{r} library(ggplot2) library(forecast) library(imputeTS) library(lubridate) ``` ### Fetch CO2 measurements from Mauna Loa observatory, Hawaii, USA ```{r} data_url = "https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/monthly/monthly_in_situ_co2_mlo.csv" data_file = "CO2.csv" if (!file.exists(data_file)) { download.file(data_url, data_file, method="auto") } ``` ### Fetch Ice Core CO2 Data ```{r} data_url = "https://scrippsco2.ucsd.edu/assets/data/atmospheric/merged_ice_core_mlo_spo/merged_ice_core_yearly.csv" data_file1 = "Ice-CO2.csv" if (!file.exists(data_file1)) { download.file(data_url, data_file1, method="auto") } ``` ### Skip the frist 36 lines ```{r} data_ice = read.csv(data_file1, header=FALSE, skip=36) ``` ### Skip the frist 5 lines of the file ```{r} data = read.csv(data_file, header=TRUE, skip=5) ``` ### Let's have a look at what we got and check NA ```{r} head(data_ice) tail(data_ice) na_records = apply(data_ice, 1, function (x) any(is.na(x))) data[na_records,] ``` ### Conversion of decimal-date into Years, rename V2 into ppm ```{r} data_ice$date<- format(date_decimal(data_ice$V1), "%Y") head(data_ice) tail(data_ice) names(data_ice)[names(data_ice) == "V2"] <- "ppm" data_ice$Year=as.numeric(data_ice$date) head(data_ice) tail(data_ice) ``` ### Let's have a look at what we got from Mauna Loa observatory and check NA ```{r} head(data) tail(data) na_records = apply(data, 1, function (x) any(is.na(x))) data[na_records,] ``` ### impute data of column X.ppm..4(CO2, seasonally filled,adjusted filled) ```{r} data$X.ppm..4[data$X.ppm..4 == -99.99] <- NA data$X.ppm..4 <- na.interpolation(data$X.ppm..4) head(data) tail(data) ``` ### data preparation ```{r} data.ts <- ts(data$X.ppm..4, start = c(1958, 1), end = c(2021, 1), freq = 12) ``` ### split training and testing data, use 90% of available data in the co2.csv file ```{r} train.test.split <- 0.9 split.index <- round(length(data.ts) * train.test.split) ``` ### train time series ```{r} data.train.ts <- window(data.ts, start = c(1958, 1), end = c(1958, split.index)) ``` ### test time series ```{r} data.test.ts <- window(data.ts, start = c(1958, split.index + 1), end = c(1958, length(data.ts))) ``` ### create a fit using the time series ```{r} fit <- ets(data.ts, model = 'AAA') ``` ### forecast using the fit ```{r} predictions <- forecast(fit, 12 * 20, level = 0.95) ``` ## Plot Ice Core Data ```{r} theme_set(theme_minimal()) ggplot(data = data_ice, aes(x = Year, y = ppm))+ geom_line(color = "#00AFBB", size = 1) ``` ## Plot predictions till 2025 ```{r} plot(predictions, main = '', xlab = 'Time', ylab = 'CO2 (ppm)', xlim = c(1958, 2025), shaded = TRUE, shadecols=c('light blue')) legend('topleft', legend=c('Historical Data', 'Predicted Data', '95% CI'), fill=c(NA, NA, 'light blue'), col=c('black', 'blue', NA), lty=c(1, 1, NA), lwd=c(1, 2, NA), border=c(NA, NA, 'black'), inset=0.01) ```