1. From numerical data, draw a map in John Snow's spirit. Show death places with markers whose size indicates number of deaths. Show water pumps on the same map with a different symbol.
2. Try to find different ways to show that the Broad street pump is at the center of the outbreak. (Density of deaths in the neighborhood ? Other approaches ?).
3. Submit work on FUN.
Use `ggmap` with OpenStreetMaps as map template with `source="osm"`.
# Importing data
There is already a version of John Snow's data sets and map in the package `HistData` in `Snow.deaths` and `Snow.pumps`. However, the coordinates fit John Snow's map and no other maps.
First download and extract the zip file from data sets available at this address : [http://rtwilson.com/downloads/SnowGIS_SHP.zip](http://rtwilson.com/downloads/SnowGIS_SHP.zip). It is the first link on [Robin's blog](http://blog.rtwilson.com/john-snows-cholera-data-in-more-formats/).
Two files of interest are the `Cholera_Deaths.shp` and the `Pumps.shp` files containing death people and water pumps coordinates.
The next code lines test if the file `Cholera_Deaths.shp` exists. If not, you have to download it manually as I do not know how to unzip documents in R ...
print("You must download the zip file as explained above")
} else{
print("Files already downloaded")
}
```
# Reading the data
Ceci est un document R markdown que vous pouvez aisément exporter au format HTML, PDF, et MS Word. Pour plus de détails sur R Markdown consultez <http://rmarkdown.rstudio.com>.
In order to read `.shp` files we need the `maptools` package.
The next code lines test if maptools is installed. If not, it does and load it :
```{r message=FALSE, warning=FALSE}
if(!require("maptools")){
install.packages("maptools")
library("maptools")
}
```
Therefore, we can read the data with the `readShapePoints` function :
We have coordinates, Id (I do not know what it is but I think it is not important) and Count that are the number of deaths at this location.
To extract coordinates, we can use the `coords` function.
```{r}
head(Deaths@coords)
```
The number of registered deaths is :
```{r}
sum(Deaths$Count)
```
It seems that this is less than recorded deaths (616) displayed on the FUN website but let's suppose that it is Robin's fault...
We can also read the file with the Pumps coordinates :
```{r message=FALSE, warning=FALSE}
Pumps <- readShapePoints("SnowGIS_SHP/Pumps")
head(Pumps)
```
There are 6 pumps.
Lorsque vous cliquerez sur le bouton **Knit** ce document sera compilé afin de ré-exécuter le code R et d'inclure les résultats dans un document final. Comme nous vous l'avons montré dans la vidéo, on inclue du code R de la façon suivante:
# Extracting London map
```{r cars}
For this purpose we will need the `ggmap` package.
summary(cars)
The next code lines test if ggmap is installed. If not, it does and load it :
```{r message=FALSE, warning=FALSE}
if(!require("ggmap")){
install.packages("ggmap")
library("ggmap")
}
```
```
I got the London coordinates centered on Broad Street on [Stamen Maps website](maps.stamen.com/toner/#17/51.51413/-0.13650
) :
```{r}
london <- c(left = -0.143, bottom = 51.51, right = -0.131, top = 51.517)
```
We will get the map based on the above coordinates with streets names with the `maptype = "toner"`.
Let's project the Deaths points on our map with the `geom_point` function in red and with a size depending on the number of deaths recorded at the particular coordinates :
From these vectors, we see that the 2nd vector seems to have the shortest distances. It corresponds to the pump number 1 which is the one on Broad street.
We can now plot the mean distance of each pump on a barplot to see if it is the case.
labs( x = "Pump number", y = "Mean distance between local maxima and pump",
title ="Distances of pumps from outbreak") +
scale_x_discrete(limits = c(0:7)) +
scale_x_continuous(breaks = c(-1:50))
gg
```
Pump number 1 is on average closer to the local maxima of Deaths and the number 5 is the farthest.
Perhaps, we can try to test if all the pump distance means are statistically greater than the mean distance of pump 1 by testing 2 by 2 with a one-sided Mann-Whitney test with the function `wilcox.test` and the option `alternative = "g"` with "g" standing for "greater".
```{r pressure, echo=FALSE}
```{r}
plot(pressure)
wilcox.test(dist_pump0, dist_pump1, alternative = "g")
wilcox.test(dist_pump2, dist_pump1, alternative = "g")
wilcox.test(dist_pump3, dist_pump1, alternative = "g")
wilcox.test(dist_pump4, dist_pump1, alternative = "g")
wilcox.test(dist_pump5, dist_pump1, alternative = "g")
wilcox.test(dist_pump6, dist_pump1, alternative = "g")
wilcox.test(dist_pump7, dist_pump1, alternative = "g")
```
```
Except for pumps number 5, 6 and 7, none of the means are greater than the mean of pump number 1.
Vous remarquerez le paramètre `echo = FALSE` qui indique que le code ne doit pas apparaître dans la version finale du document. Nous vous recommandons dans le cadre de ce MOOC de ne pas utiliser ce paramètre car l'objectif est que vos analyses de données soient parfaitement transparentes pour être reproductibles.
# Conclusion
Comme les résultats ne sont pas stockés dans les fichiers Rmd, pour faciliter la relecture de vos analyses par d'autres personnes, vous aurez donc intérêt à générer un HTML ou un PDF et à le commiter.
I think that I kind of showed that the pump on Broad street was in the middle of the outbreak. However, my statistical tests are not really conclusive ... This is because I do not know how to deal with density and distances. The fact that I reduced the density in local maxima lower the tests power and that is why it is not significantly different.
For the troubles part, I had difficulties in getting the data as I do not have experience with maps in R. At first, I almost believed that there was something wrong with Robin's blog website because some links did not work. But, by going on forums on the internet, I understood that I could read `.shp` files. Then I had real difficulties getting a map. The OpenStreetMaps source in the `ggmap` function did not work as it asked a Google API key ... I got around by getting Stanemmap maps. However, as I finally get the map to work, I realized that the files with the deaths and pumps coordinates had a different coordinate system than the map. I am lucky that the code I copied pasted worked. I think, I would have used the map that was in the folder `SnowGIS_SHP` if it did not work. Next steps was quite easy as I do know how to plot with `ggplot`.
Maintenant, à vous de jouer! Vous pouvez effacer toutes ces informations et les remplacer par votre document computationnel.