From numerical data, draw map in John Snow’s spirit. Show death places with markers whose size indicates number of deaths. Show water pumps on the same map with a different symbol.
Try to find different ways to show that the Broad street pump is at the center of the outbreak. (Density of deaths in the neighbourhood ? Other approaches ?).
Submit work on FUN.
Use ggmap
with OpenStreetMaps as map template with source="osm"
.
There is already an version of John Snow’s data sets and map in the package HistData
in Snow.deaths
and Snow.pumps
. However, the coordinates fit John Snow’s map and no other maps.
First download and extract the zip file from data sets available at this adress : http://rtwilson.com/downloads/SnowGIS_SHP.zip. It is the first link on Robin’s blog.
Two files of interest are the Cholera_Deaths.shp
and the Pumps.shp
files containing death people and water pumps coordinates.
The next code lines test if the file Cholera_Deaths.shp
exists. If not, you have to download it manually as I do not know how to unzip documents in R …
if(file.exists("SnowGIS_SHP/Cholera_Deaths.shp") == FALSE){
print("You must download the zip file as explained above")
} else{
print("Files already downloaded")
}
## [1] "Files already downloaded"
In order to read .shp
files we need the maptools
package.
The next code lines test if maptools is installed. If not, it does and load it :
if(!require("maptools")){
install.packages("maptools")
library("maptools")
}
Therefore, we can read the data with the readShapePoints
function :
Deaths <- readShapePoints("SnowGIS_SHP/Cholera_Deaths")
## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read
head(Deaths)
## coordinates Id Count
## 0 (529308.7, 181031.4) 0 3
## 1 (529312.2, 181025.2) 0 2
## 2 (529314.4, 181020.3) 0 1
## 3 (529317.4, 181014.3) 0 1
## 4 (529320.7, 181007.9) 0 4
## 5 (529336.7, 181006) 0 2
We have coordinates, Id (I do not know what it is but I think it is not important) and Count that are the number of deaths at this location.
To extract coordinates, we can use the coords
function.
head(Deaths@coords)
## coords.x1 coords.x2
## 0 529308.7 181031.4
## 1 529312.2 181025.2
## 2 529314.4 181020.3
## 3 529317.4 181014.3
## 4 529320.7 181007.9
## 5 529336.7 181006.0
The number of recensed deaths is :
sum(Deaths$Count)
## [1] 489
It seems that this is less than recorded deaths (616) displayed on the FUN website but let’s suppose that it Robin’s fault…
We can also read the file with the Pumps coordinates :
Pumps <- readShapePoints("SnowGIS_SHP/Pumps")
## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read
head(Pumps)
## coordinates Id
## 0 (529396.5, 181025.1) 0
## 1 (529192.5, 181079.4) 0
## 2 (529183.7, 181193.7) 0
## 3 (529748.9, 180924.2) 0
## 4 (529613.2, 180896.8) 0
## 5 (529453.6, 180826.4) 0
There are 6 pumps.
For this purpose we will need the ggmap
package.
The next code lines test if ggmap is installed. If not, it does and load it :
if(!require("ggmap")){
install.packages("ggmap")
library("ggmap")
}
I got the london coordinates centrered on Broad Street on Stamen Maps website :
london <- c(left = -0.14454, bottom = 51.51139, right = -0.13119, top = 51.51630)
We will ge the map based on the above coordinates with streets names with the maptype = "toner"
.
london_map = get_stamenmap(london, zoom = 17, maptype = "toner")
## Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
We can display the map with ggmap.
map <- ggmap(london_map)
We will first extract the points and their coordinates from the .shp
file.
Remember that we put the Deaths points in the Deaths
variable and the Pumps points in the Pumps
variable.
Deaths_coord <- data.frame(Deaths@coords)
Pumps_coord <- data.frame(Pumps@coords)
Unfortunetely, both coordinates are in OSGB36 National Grid reference while our map is in classic decimal degrees reference.
Fortunetely, I found on the internet this following code that works but I do not know exactly how it works …
It seems that it specifies the Coordinate Reference System (CRS) first to the Deaths and Pumps coordinates with the CRS
function.
And then, it transforms the system to a more classic longitude and lattitude system with the sptransform
function.
coordinates(Deaths_coord)=~coords.x1+coords.x2
coordinates(Pumps_coord)=~coords.x1+coords.x2
proj4string(Deaths_coord)=CRS("+init=epsg:27700")
proj4string(Pumps_coord)=CRS("+init=epsg:27700")
Deaths_coord = spTransform(Deaths_coord,CRS("+proj=longlat +datum=WGS84"))
Pumps_coord = spTransform(Pumps_coord,CRS("+proj=longlat +datum=WGS84"))
df_Deaths=data.frame(Deaths_coord@coords)
df_Pumps=data.frame(Pumps_coord@coords)
Let’s see the results :
head(df_Deaths)
## coords.x1 coords.x2
## 0 -0.1379301 51.51342
## 1 -0.1378831 51.51336
## 2 -0.1378529 51.51332
## 3 -0.1378120 51.51326
## 4 -0.1377668 51.51320
## 5 -0.1375369 51.51318
head(df_Pumps)
## coords.x1 coords.x2
## 0 -0.1366679 51.51334
## 1 -0.1395862 51.51388
## 2 -0.1396710 51.51491
## 3 -0.1316299 51.51235
## 4 -0.1335944 51.51214
## 5 -0.1359191 51.51154
Looks fine for both points :)!
Let’s project the Deaths points on our map with the geom_point
function in red and with a size depending on the number of deaths recorded at the particular coordinates :
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5)
Let’s add the pumps in another color and symbol :
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue")
## Warning: Removed 2 rows containing missing values (geom_point).
Highlighting the Broad street pump (number 1 in the coordinates list :
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green")
## Warning: Removed 2 rows containing missing values (geom_point).
We can try to plot a density of Deaths with the stat_density_2d
function :
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green")
## Warning: Removed 2 rows containing missing values (geom_point).
We can clearly see here that the green pump on broadwick street seems to be at the center of the outbreak.
We can try to get the coordinates of the maximum density points with function in MASS
and raster
libraries :
if(!require("MASS")){
install.packages("MASS")
library("MASS")
}
## Loading required package: MASS
if(!require("raster")){
install.packages("raster")
library("raster")
}
## Loading required package: raster
##
## Attaching package: 'raster'
## The following objects are masked from 'package:MASS':
##
## area, select
As I am no mathematician I copy pasted a code from the internet.
I think that it computes and retrieve the coordiantes of the local maxima from the Deaths data frame.
w = matrix(1,3,3)
x = kde2d(x = df_Deaths$coords.x1, y = df_Deaths$coords.x2)
r = raster(x)
f <- function(X) max(X, na.rm=TRUE)
localmax <- focal(r, w, fun = f, pad=TRUE, padValue=NA)
r2 <- r==localmax
maxXY <- xyFromCell(r2, Which(r2==1, cells=TRUE))
maxXY <- as.data.frame(maxXY)
maxXY
## x y
## 1 -0.1344211 51.51583
## 2 -0.1341236 51.51434
## 3 -0.1359086 51.51401
## 4 -0.1379912 51.51335
## 5 -0.1365037 51.51285
In maxXY there are the coordinates of the 5 local maxima.
We can plot them on the map in yellow.
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green") +
geom_point(maxXY, mapping = aes(x, y), color = "yellow", size = 3, fill = "yellow")
## Warning: Removed 2 rows containing missing values (geom_point).
It really seems that the closest pump to the maxima is the one on Broad street.
Let’s see if we can compute the distance between these maxima and the pumps to show what pump is the closest to what maximum.
If I remember well, the distance between 2 points on a graph can be calculated with the following equation : \[\frac{y2 - y1}{x2 - x1}\].