--- title: "Peer_Review_Cholera" author: "Marc" date: "15/04/2020" output: html_document: default pdf_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Instructions 1. From numerical data, draw map in John Snow's spirit. Show death places with markers whose size indicates number of deaths. Show water pumps on the same map with a different symbol. 2. Try to find different ways to show that the Broad street pump is at the center of the outbreak. (Density of deaths in the neighbourhood ? Other approaches ?). 3. Submit work on FUN. Use `ggmap` with OpenStreetMaps as map template with `source="osm"`. # Importing data There is already an version of John Snow's data sets and map in the package `HistData` in `Snow.deaths` and `Snow.pumps`. However, the coordinates fit John Snow's map and no other maps. First download and extract the zip file from data sets available at this adress : [http://rtwilson.com/downloads/SnowGIS_SHP.zip](http://rtwilson.com/downloads/SnowGIS_SHP.zip). It is the first link on [Robin's blog](http://blog.rtwilson.com/john-snows-cholera-data-in-more-formats/). Two files of interest are the `Cholera_Deaths.shp` and the `Pumps.shp` files containing death people and water pumps coordinates. The next code lines test if the file `Cholera_Deaths.shp`exists. If not, you have to download it manually as I do not know how to unzip documents in R ... ```{r} if(file.exists("SnowGIS_SHP/Cholera_Deaths.shp") == FALSE){ print("You must download the zip file as explained above") } else{ print("Files already downloaded") } ``` # Reading the data In order to read `.shp` files we need the `maptools` package. The next code lines test if maptools is installed. If not, it does and load it : ```{r message=FALSE, warning=FALSE} if(!require("maptools")){ install.packages("maptools") library("maptools") } ``` Therefore, we can read the data with the `readShapePoints` function : ```{r} Deaths <- readShapePoints("SnowGIS_SHP/Cholera_Deaths") head(Deaths) ``` We have coordinates, Id (I do not know what it is but I think it is not important) and Count that are the number of deaths at this location. To extract coordinates, we can use the `coords`function. ```{r} head(Deaths@coords) ``` The number of recensed deaths is : ```{r} sum(Deaths$Count) ``` It seems that this is less than recorded deaths (616) displayed on the FUN website but let's suppose that it Robin's fault... We can also read the file with the Pumps coordinates : ```{r} Pumps <- readShapePoints("SnowGIS_SHP/Pumps") head(Pumps) ``` There are 6 pumps. # Extracting london map For this purpose we will need the `ggmap` package. The next code lines test if ggmap is installed. If not, it does and load it : ```{r message=FALSE, warning=FALSE} if(!require("ggmap")){ install.packages("ggmap") library("ggmap") } ``` I got the london coordinates centrered on Broad Street on [Stamen Maps website](maps.stamen.com/toner/#17/51.51413/-0.13650 ) : ```{r} london <- c(left = -0.14454, bottom = 51.51139, right = -0.13119, top = 51.51630) ``` We will ge the map based on the above coordinates with streets names with the `maptype = "toner"`. ```{r} london_map = get_stamenmap(london, zoom = 17, maptype = "toner") ``` We can display the map with ggmap. ```{r} map <- ggmap(london_map) ``` # Displaying the Cholera deaths and pumps on the map We will first extract the points and their coordinates from the `.shp` file. Remember that we put the Deaths points in the `Deaths` variable and the Pumps points in the `Pumps` variable. ```{r} Deaths_coord <- data.frame(Deaths@coords) Pumps_coord <- data.frame(Pumps@coords) ``` Unfortunetely, both coordinates are in _OSGB36 National Grid_ reference while our map is in classic _decimal degrees_ reference. Fortunetely, I found on the internet this following code that works but I do not know exactly how it works ... It seems that it specifies the _Coordinate Reference System (CRS)_ first to the Deaths and Pumps coordinates with the `CRS` function. And then, it transforms the system to a more classic longitude and lattitude system with the `sptransform` function. ```{r} coordinates(Deaths_coord)=~coords.x1+coords.x2 coordinates(Pumps_coord)=~coords.x1+coords.x2 proj4string(Deaths_coord)=CRS("+init=epsg:27700") proj4string(Pumps_coord)=CRS("+init=epsg:27700") Deaths_coord = spTransform(Deaths_coord,CRS("+proj=longlat +datum=WGS84")) Pumps_coord = spTransform(Pumps_coord,CRS("+proj=longlat +datum=WGS84")) df_Deaths=data.frame(Deaths_coord@coords) df_Pumps=data.frame(Pumps_coord@coords) ``` Let's see the results : ```{r} head(df_Deaths) ``` ```{r} head(df_Pumps) ``` Looks fine for both points :)! Let's project the Deaths points on our map with the `geom_point`function in red and with a size depending on the number of deaths recorded at the particular coordinates : ```{r} map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) ``` Let's add the pumps in another color and symbol : ```{r} map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) + geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") ``` Highlighting the Broad street pump (number 1 in the coordinates list : ```{r} map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) + geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") + geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green") ``` # Analysis to show that the Broad street pump is at the center of the outbreak We can try to plot a density of Deaths with the `stat_density_2d` function : ```{r} map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) + stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) + geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") + geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green") ``` We can clearly see here that the green pump on broadwick street seems to be at the center of the outbreak. We can try to get the coordinates of the maximum density points with function in `MASS` and `raster` libraries : ```{r} if(!require("MASS")){ install.packages("MASS") library("MASS") } if(!require("raster")){ install.packages("raster") library("raster") } ``` As I am no mathematician I copy pasted a code from the internet. I think that it computes and retrieve the coordiantes of the local maxima from the Deaths data frame. ```{r} w = matrix(1,3,3) x = kde2d(x = df_Deaths$coords.x1, y = df_Deaths$coords.x2) r = raster(x) f <- function(X) max(X, na.rm=TRUE) localmax <- focal(r, w, fun = f, pad=TRUE, padValue=NA) r2 <- r==localmax maxXY <- xyFromCell(r2, Which(r2==1, cells=TRUE)) maxXY <- as.data.frame(maxXY) maxXY ``` In maxXY there are the coordinates of the 5 local maxima. We can plot them on the map in yellow. ```{r} map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) + stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) + geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") + geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green") + geom_point(maxXY, mapping = aes(x, y), color = "yellow", size = 3, fill = "yellow") ``` It really seems that the closest pump to the maxima is the one on Broad street. Let's see if we can compute the distance between these maxima and the pumps to show what pump is the closest to what maximum. If I remember well, the distance between 2 points on a graph can be calculated with the following equation : $$\frac{y2 - y1}{x2 - x1}$$. ```{r} dist_pump1 <- c() dist_pump2 <- c() dist_pump3 <- c() dist_pump4 <- c() dist_pump5 <- c() for (i in c(1:length(maxXY$x))){ dist_pump1[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[1])/(maxXY$x[i]-df_Pumps$coords.x1[1]) } for (i in c(1:length(maxXY$x))){ dist_pump2[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[2])/(maxXY$x[i]-df_Pumps$coords.x1[2]) } for (i in c(1:length(maxXY$x))){ dist_pump3[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[3])/(maxXY$x[i]-df_Pumps$coords.x1[3]) } for (i in c(1:length(maxXY$x))){ dist_pump4[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[4])/(maxXY$x[i]-df_Pumps$coords.x1[4]) } for (i in c(1:length(maxXY$x))){ dist_pump5[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[5])/(maxXY$x[i]-df_Pumps$coords.x1[5]) } dist_pump1 dist_pump2 dist_pump3 dist_pump4 dist_pump5 ```