Instructions

  1. From numerical data, draw map in John Snow’s spirit. Show death places with markers whose size indicates number of deaths. Show water pumps on the same map with a different symbol.

  2. Try to find different ways to show that the Broad street pump is at the center of the outbreak. (Density of deaths in the neighbourhood ? Other approaches ?).

  3. Submit work on FUN.

Use ggmap with OpenStreetMaps as map template with source="osm".

Importing data

There is already an version of John Snow’s data sets and map in the package HistData in Snow.deaths and Snow.pumps. However, the coordinates fit John Snow’s map and no other maps.

First download and extract the zip file from data sets available at this adress : http://rtwilson.com/downloads/SnowGIS_SHP.zip. It is the first link on Robin’s blog.
Two files of interest are the Cholera_Deaths.shp and the Pumps.shp files containing death people and water pumps coordinates.
The next code lines test if the file Cholera_Deaths.shpexists. If not, you have to download it manually as I do not know how to unzip documents in R …

if(file.exists("SnowGIS_SHP/Cholera_Deaths.shp") == FALSE){
  print("You must download the zip file as explained above")
} else{
  print("Files already downloaded")
}
## [1] "Files already downloaded"

Reading the data

In order to read .shp files we need the maptools package.
The next code lines test if maptools is installed. If not, it does and load it :

if(!require("maptools")){
  install.packages("maptools")
  library("maptools")
}

Therefore, we can read the data with the readShapePoints function :

Deaths <- readShapePoints("SnowGIS_SHP/Cholera_Deaths")
## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read
head(Deaths)
##            coordinates Id Count
## 0 (529308.7, 181031.4)  0     3
## 1 (529312.2, 181025.2)  0     2
## 2 (529314.4, 181020.3)  0     1
## 3 (529317.4, 181014.3)  0     1
## 4 (529320.7, 181007.9)  0     4
## 5   (529336.7, 181006)  0     2

We have coordinates, Id (I do not know what it is but I think it is not important) and Count that are the number of deaths at this location.
To extract coordinates, we can use the coordsfunction.

head(Deaths@coords)
##   coords.x1 coords.x2
## 0  529308.7  181031.4
## 1  529312.2  181025.2
## 2  529314.4  181020.3
## 3  529317.4  181014.3
## 4  529320.7  181007.9
## 5  529336.7  181006.0

The number of recensed deaths is :

sum(Deaths$Count)
## [1] 489

It seems that this is less than recorded deaths (616) displayed on the FUN website but let’s suppose that it Robin’s fault…

We can also read the file with the Pumps coordinates :

Pumps <- readShapePoints("SnowGIS_SHP/Pumps")
## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read
head(Pumps)
##            coordinates Id
## 0 (529396.5, 181025.1)  0
## 1 (529192.5, 181079.4)  0
## 2 (529183.7, 181193.7)  0
## 3 (529748.9, 180924.2)  0
## 4 (529613.2, 180896.8)  0
## 5 (529453.6, 180826.4)  0

There are 6 pumps.

Extracting london map

For this purpose we will need the ggmap package.
The next code lines test if ggmap is installed. If not, it does and load it :

if(!require("ggmap")){
  install.packages("ggmap")
  library("ggmap")
}

I got the london coordinates centrered on Broad Street on Stamen Maps website :

london <- c(left = -0.14454, bottom = 51.51139, right = -0.13119, top = 51.51630)

We will ge the map based on the above coordinates with streets names with the maptype = "toner".

london_map = get_stamenmap(london, zoom = 17, maptype = "toner")
## Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.

We can display the map with ggmap.

map <- ggmap(london_map)

Displaying the Cholera deaths and pumps on the map

We will first extract the points and their coordinates from the .shp file.
Remember that we put the Deaths points in the Deaths variable and the Pumps points in the Pumps variable.

Deaths_coord <- data.frame(Deaths@coords)
Pumps_coord <- data.frame(Pumps@coords)

Unfortunetely, both coordinates are in OSGB36 National Grid reference while our map is in classic decimal degrees reference.
Fortunetely, I found on the internet this following code that works but I do not know exactly how it works …
It seems that it specifies the Coordinate Reference System (CRS) first to the Deaths and Pumps coordinates with the CRS function.
And then, it transforms the system to a more classic longitude and lattitude system with the sptransform function.

coordinates(Deaths_coord)=~coords.x1+coords.x2
coordinates(Pumps_coord)=~coords.x1+coords.x2
proj4string(Deaths_coord)=CRS("+init=epsg:27700") 
proj4string(Pumps_coord)=CRS("+init=epsg:27700") 
Deaths_coord = spTransform(Deaths_coord,CRS("+proj=longlat +datum=WGS84"))
Pumps_coord = spTransform(Pumps_coord,CRS("+proj=longlat +datum=WGS84"))
df_Deaths=data.frame(Deaths_coord@coords)
df_Pumps=data.frame(Pumps_coord@coords)

Let’s see the results :

head(df_Deaths)
##    coords.x1 coords.x2
## 0 -0.1379301  51.51342
## 1 -0.1378831  51.51336
## 2 -0.1378529  51.51332
## 3 -0.1378120  51.51326
## 4 -0.1377668  51.51320
## 5 -0.1375369  51.51318
head(df_Pumps)
##    coords.x1 coords.x2
## 0 -0.1366679  51.51334
## 1 -0.1395862  51.51388
## 2 -0.1396710  51.51491
## 3 -0.1316299  51.51235
## 4 -0.1335944  51.51214
## 5 -0.1359191  51.51154

Looks fine for both points :)!

Let’s project the Deaths points on our map with the geom_pointfunction in red and with a size depending on the number of deaths recorded at the particular coordinates :

map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5)

Let’s add the pumps in another color and symbol :

map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
  geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue")
## Warning: Removed 2 rows containing missing values (geom_point).

Highlighting the Broad street pump (number 1 in the coordinates list :

map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
  geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
  geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green")
## Warning: Removed 2 rows containing missing values (geom_point).

Analysis to show that the Broad street pump is at the center of the outbreak

We can try to plot a density of Deaths with the stat_density_2d function :

map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) + 
  stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) +
  geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
  geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green")
## Warning: Removed 2 rows containing missing values (geom_point).

We can clearly see here that the green pump on broadwick street seems to be at the center of the outbreak.

We can try to get the coordinates of the maximum density points with function in MASS and raster libraries :

if(!require("MASS")){
  install.packages("MASS")
  library("MASS")
}
## Loading required package: MASS
if(!require("raster")){
  install.packages("raster")
  library("raster")
}
## Loading required package: raster
## 
## Attaching package: 'raster'
## The following objects are masked from 'package:MASS':
## 
##     area, select

As I am no mathematician I copy pasted a code from the internet.
I think that it computes and retrieve the coordiantes of the local maxima from the Deaths data frame.

w = matrix(1,3,3)
x = kde2d(x = df_Deaths$coords.x1, y = df_Deaths$coords.x2)
r = raster(x)
f <- function(X) max(X, na.rm=TRUE)
localmax <- focal(r, w, fun = f, pad=TRUE, padValue=NA)
r2 <- r==localmax
maxXY <- xyFromCell(r2, Which(r2==1, cells=TRUE))
maxXY <- as.data.frame(maxXY)
maxXY
##            x        y
## 1 -0.1344211 51.51583
## 2 -0.1341236 51.51434
## 3 -0.1359086 51.51401
## 4 -0.1379912 51.51335
## 5 -0.1365037 51.51285

In maxXY there are the coordinates of the 5 local maxima.
We can plot them on the map in yellow.

map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) + 
  stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) +
  geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
  geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green") +
  geom_point(maxXY, mapping = aes(x, y), color = "yellow", size = 3, fill = "yellow")
## Warning: Removed 2 rows containing missing values (geom_point).

It really seems that the closest pump to the maxima is the one on Broad street.

Let’s see if we can compute the distance between these maxima and the pumps to show what pump is the closest to what maximum.
If I remember well, the distance between 2 points on a graph can be calculated with the following equation : \[\frac{y2 - y1}{x2 - x1}\].