Commit 9c2cb5f8 authored by Marc Oudart's avatar Marc Oudart

Exercice 3 module 3 presque fini

parent 930065ff
......@@ -2,7 +2,9 @@
title: "Analyse de l'incidence du syndrome grippal"
author: "Marc"
date: "10/04/2020"
output: html_document
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
......
---
title: "Peer_Review_Cholera"
author: "Marc"
date: "15/04/2020"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Instructions
1. From numerical data, draw map in John Snow's spirit. Show death places with markers whose size indicates number of deaths. Show water pumps on the same map with a different symbol.
2. Try to find different ways to show that the Broad street pump is at the center of the outbreak. (Density of deaths in the neighbourhood ? Other approaches ?).
3. Submit work on FUN.
Use `ggmap` with OpenStreetMaps as map template with `source="osm"`.
# Importing data
There is already an version of John Snow's data sets and map in the package `HistData` in `Snow.deaths` and `Snow.pumps`. However, the coordinates fit John Snow's map and no other maps.
First download and extract the zip file from data sets available at this adress : [http://rtwilson.com/downloads/SnowGIS_SHP.zip](http://rtwilson.com/downloads/SnowGIS_SHP.zip). It is the first link on [Robin's blog](http://blog.rtwilson.com/john-snows-cholera-data-in-more-formats/).
Two files of interest are the `Cholera_Deaths.shp` and the `Pumps.shp` files containing death people and water pumps coordinates.
The next code lines test if the file `Cholera_Deaths.shp`exists. If not, you have to download it manually as I do not know how to unzip documents in R ...
```{r}
if(file.exists("SnowGIS_SHP/Cholera_Deaths.shp") == FALSE){
print("You must download the zip file as explained above")
} else{
print("Files already downloaded")
}
```
# Reading the data
In order to read `.shp` files we need the `maptools` package.
The next code lines test if maptools is installed. If not, it does and load it :
```{r message=FALSE, warning=FALSE}
if(!require("maptools")){
install.packages("maptools")
library("maptools")
}
```
Therefore, we can read the data with the `readShapePoints` function :
```{r}
Deaths <- readShapePoints("SnowGIS_SHP/Cholera_Deaths")
head(Deaths)
```
We have coordinates, Id (I do not know what it is but I think it is not important) and Count that are the number of deaths at this location.
To extract coordinates, we can use the `coords`function.
```{r}
head(Deaths@coords)
```
The number of recensed deaths is :
```{r}
sum(Deaths$Count)
```
It seems that this is less than recorded deaths (616) displayed on the FUN website but let's suppose that it Robin's fault...
We can also read the file with the Pumps coordinates :
```{r}
Pumps <- readShapePoints("SnowGIS_SHP/Pumps")
head(Pumps)
```
There are 6 pumps.
# Extracting london map
For this purpose we will need the `ggmap` package.
The next code lines test if ggmap is installed. If not, it does and load it :
```{r message=FALSE, warning=FALSE}
if(!require("ggmap")){
install.packages("ggmap")
library("ggmap")
}
```
I got the london coordinates centrered on Broad Street on [Stamen Maps website](maps.stamen.com/toner/#17/51.51413/-0.13650
) :
```{r}
london <- c(left = -0.14454, bottom = 51.51139, right = -0.13119, top = 51.51630)
```
We will ge the map based on the above coordinates with streets names with the `maptype = "toner"`.
```{r}
london_map = get_stamenmap(london, zoom = 17, maptype = "toner")
```
We can display the map with ggmap.
```{r}
map <- ggmap(london_map)
```
# Displaying the Cholera deaths and pumps on the map
We will first extract the points and their coordinates from the `.shp` file.
Remember that we put the Deaths points in the `Deaths` variable and the Pumps points in the `Pumps` variable.
```{r}
Deaths_coord <- data.frame(Deaths@coords)
Pumps_coord <- data.frame(Pumps@coords)
```
Unfortunetely, both coordinates are in _OSGB36 National Grid_ reference while our map is in classic _decimal degrees_ reference.
Fortunetely, I found on the internet this following code that works but I do not know exactly how it works ...
It seems that it specifies the _Coordinate Reference System (CRS)_ first to the Deaths and Pumps coordinates with the `CRS` function.
And then, it transforms the system to a more classic longitude and lattitude system with the `sptransform` function.
```{r}
coordinates(Deaths_coord)=~coords.x1+coords.x2
coordinates(Pumps_coord)=~coords.x1+coords.x2
proj4string(Deaths_coord)=CRS("+init=epsg:27700")
proj4string(Pumps_coord)=CRS("+init=epsg:27700")
Deaths_coord = spTransform(Deaths_coord,CRS("+proj=longlat +datum=WGS84"))
Pumps_coord = spTransform(Pumps_coord,CRS("+proj=longlat +datum=WGS84"))
df_Deaths=data.frame(Deaths_coord@coords)
df_Pumps=data.frame(Pumps_coord@coords)
```
Let's see the results :
```{r}
head(df_Deaths)
```
```{r}
head(df_Pumps)
```
Looks fine for both points :)!
Let's project the Deaths points on our map with the `geom_point`function in red and with a size depending on the number of deaths recorded at the particular coordinates :
```{r}
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5)
```
Let's add the pumps in another color and symbol :
```{r}
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue")
```
Highlighting the Broad street pump (number 1 in the coordinates list :
```{r}
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green")
```
# Analysis to show that the Broad street pump is at the center of the outbreak
We can try to plot a density of Deaths with the `stat_density_2d` function :
```{r}
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green")
```
We can clearly see here that the green pump on broadwick street seems to be at the center of the outbreak.
We can try to get the coordinates of the maximum density points with function in `MASS` and `raster` libraries :
```{r}
if(!require("MASS")){
install.packages("MASS")
library("MASS")
}
if(!require("raster")){
install.packages("raster")
library("raster")
}
```
As I am no mathematician I copy pasted a code from the internet.
I think that it computes and retrieve the coordiantes of the local maxima from the Deaths data frame.
```{r}
w = matrix(1,3,3)
x = kde2d(x = df_Deaths$coords.x1, y = df_Deaths$coords.x2)
r = raster(x)
f <- function(X) max(X, na.rm=TRUE)
localmax <- focal(r, w, fun = f, pad=TRUE, padValue=NA)
r2 <- r==localmax
maxXY <- xyFromCell(r2, Which(r2==1, cells=TRUE))
maxXY <- as.data.frame(maxXY)
maxXY
```
In maxXY there are the coordinates of the 5 local maxima.
We can plot them on the map in yellow.
```{r}
map + geom_point(df_Deaths, mapping = aes(coords.x1, coords.x2), color = "red", size = Deaths$Count, alpha = 0.5) +
stat_density_2d(data = df_Deaths, aes(coords.x1, coords.x2, fill = after_stat(level)), geom = "polygon", alpha = 0.3) +
geom_point(df_Pumps, mapping = aes(coords.x1, coords.x2), color = "blue", shape = 24, size = 3, fill = "blue") +
geom_point(df_Pumps, mapping = aes(coords.x1[1], coords.x2[1]), color = "green", shape = 24, size = 3, fill = "green") +
geom_point(maxXY, mapping = aes(x, y), color = "yellow", size = 3, fill = "yellow")
```
It really seems that the closest pump to the maxima is the one on Broad street.
Let's see if we can compute the distance between these maxima and the pumps to show what pump is the closest to what maximum.
If I remember well, the distance between 2 points on a graph can be calculated with the following equation : $$\frac{y2 - y1}{x2 - x1}$$.
```{r}
dist_pump1 <- c()
dist_pump2 <- c()
dist_pump3 <- c()
dist_pump4 <- c()
dist_pump5 <- c()
for (i in c(1:length(maxXY$x))){
dist_pump1[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[1])/(maxXY$x[i]-df_Pumps$coords.x1[1])
}
for (i in c(1:length(maxXY$x))){
dist_pump2[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[2])/(maxXY$x[i]-df_Pumps$coords.x1[2])
}
for (i in c(1:length(maxXY$x))){
dist_pump3[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[3])/(maxXY$x[i]-df_Pumps$coords.x1[3])
}
for (i in c(1:length(maxXY$x))){
dist_pump4[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[4])/(maxXY$x[i]-df_Pumps$coords.x1[4])
}
for (i in c(1:length(maxXY$x))){
dist_pump5[i] <- abs(maxXY$y[i]-df_Pumps$coords.x2[5])/(maxXY$x[i]-df_Pumps$coords.x1[5])
}
dist_pump1
dist_pump2
dist_pump3
dist_pump4
dist_pump5
```
This source diff could not be displayed because it is too large. You can view the blob instead.
PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB 1936",SPHEROID["Airy 1830",6377563.396,299.3249646]],PRIMEM["Greenwich",0.0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["false_easting",400000.0],PARAMETER["false_northing",-100000.0],PARAMETER["central_meridian",-2.0],PARAMETER["scale_factor",0.9996012717],PARAMETER["latitude_of_origin",49.0],UNIT["m",1.0]]
\ No newline at end of file
1.0000000000
0.0000000000
0.0000000000
-1.0000000000
528765.5000000000
181518.5000000000
1.0000000000
0.0000000000
0.0000000000
-0.9841121495
528765.5000000000
181518.5079439252
<PAMDataset>
<Metadata />
<Metadata domain="Esri">
<MDI key="PyramidResamplingType">NEAREST</MDI>
</Metadata>
<PAMRasterBand band="1">
<Metadata>
<MDI key="RepresentationType">THEMATIC</MDI>
</Metadata>
</PAMRasterBand>
</PAMDataset>
PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB 1936",SPHEROID["Airy 1830",6377563.396,299.3249646]],PRIMEM["Greenwich",0.0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["false_easting",400000.0],PARAMETER["false_northing",-100000.0],PARAMETER["central_meridian",-2.0],PARAMETER["scale_factor",0.9996012717],PARAMETER["latitude_of_origin",49.0],UNIT["m",1.0]]
\ No newline at end of file
README file for Snow GIS data
-----------------------------
This zip file contains a number of GIS layers relating to John Snow's 1854 investigation of a
Cholera outbreak in London - considered by many to be the first use of geographical analysis
in an epidemiological study. More details on the history are available at
http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak
This file contains a number of GIS layers created from Snow's original map which allow analyses to be
conducted on the data in modern GIS systems. For example, clustering of cases can be analysed and the
effect of spatial aggregation in modern anonymised health data releases. Of course, it's also just
interesting to look at the area, and how little it has changed since 1854.
Files included:
(Many of the items in the list consist of many actual files (for example .shp, .dbf etc)
* OSMap Raster Modern OS map of the area of the outbreak (from OS Open Data - contains Ordnance Survey data © Crown copyright and database right 2013)
* OSMap_Greyscale Raster Same as above, but in greyscale for easier visualisation (altered by conversion to greyscale, from OS Open Data - contains Ordnance Survey data © Crown copyright and database right 2013)
* SnowMap Raster Snow's original map, georeferenced and warped so that it accurately overlays the OS map
* CholeraDeaths Vector Points for each location of one or more deaths. Attribute value gives number of deaths at that location
* Pumps Vector Points for each location of a pump
Created and compiled by Robin Wilson (robin@rtwilson.com, www.rtwilson.com/academic) - Jan 2011.
\ No newline at end of file
0.3159420000
0.0000000000
0.0000000000
-0.3159420000
528865.7464993792
181433.8952167343
<PAMDataset>
<Metadata/>
<Metadata domain="ESRI">
<MDI key="PyramidResamplingType">NEAREST</MDI>
</Metadata>
<PAMRasterBand band="1">
<Histograms>
<HistItem>
<HistMin>-0.5</HistMin>
<HistMax>255.5</HistMax>
<BucketCount>256</BucketCount>
<IncludeOutOfRange>1</IncludeOutOfRange>
<Approximate>0</Approximate>
<HistCounts>1056558|38085|35030|30635|26721|23196|19973|17614|15202|13621|12110|10633|9602|8622|7615|6827|6160|5785|5263|4850|4471|4102|3821|3520|3396|3346|3106|3054|2953|2870|2747|2735|2682|2717|2536|2555|2465|2514|2461|2489|2566|2390|2491|2486|2418|2385|2321|2323|2264|2291|2389|2392|2227|2301|2275|2187|2254|2250|2202|2205|2187|2217|2155|2220|2105|2123|2165|2206|2149|2031|2108|2141|2066|2100|2080|2025|2107|2075|2068|2019|2023|2010|2026|2005|1995|2090|1978|2045|1994|1989|1999|2039|1942|1972|1946|1990|2006|1964|1962|1893|1939|1906|1888|1908|1968|1846|1918|1979|2044|1925|1884|1859|1862|1947|1930|1825|1934|1891|1975|1830|1898|1895|1926|1973|1924|1986|1920|1797|1912|1986|1890|1924|1944|1915|1996|1900|1925|1945|2011|1882|1953|1952|1974|1929|1997|2053|1981|1993|1959|1880|1925|1946|2015|1947|1917|2052|2062|2071|2017|2089|1949|2030|2007|2083|2080|1949|2085|2111|2091|2151|2135|2137|2019|2129|2076|2149|2084|2188|2108|2188|2105|2178|2159|2197|2146|2174|2360|2281|2260|2259|2254|2297|2279|2338|2337|2365|2450|2368|2407|2407|2435|2379|2488|2492|2497|2525|2490|2566|2577|2565|2676|2712|2726|2674|2688|2799|2707|2738|2831|2956|2800|2961|2904|3059|2994|3157|3254|3316|3508|3710|3905|4319|4664|5116|5937|6536|7435|8678|9923|11935|13906|16928|19919|23808|28421|34066|40774|48059|57324|67501|80304|94264|109903|125204|137869|5850359</HistCounts>
</HistItem>
</Histograms>
<Metadata>
<MDI key="STATISTICS_MINIMUM">0</MDI>
<MDI key="STATISTICS_MAXIMUM">255</MDI>
<MDI key="STATISTICS_MEAN">207.40977040919</MDI>
<MDI key="STATISTICS_STDDEV">94.77357404896</MDI>
</Metadata>
</PAMRasterBand>
<PAMRasterBand band="2">
<Histograms>
<HistItem>
<HistMin>-0.5</HistMin>
<HistMax>255.5</HistMax>
<BucketCount>256</BucketCount>
<IncludeOutOfRange>1</IncludeOutOfRange>
<Approximate>0</Approximate>
<HistCounts>1056558|38085|35030|30635|26721|23196|19973|17614|15202|13621|12110|10633|9602|8622|7615|6827|6160|5785|5263|4850|4471|4102|3821|3520|3396|3346|3106|3054|2953|2870|2747|2735|2682|2717|2536|2555|2465|2514|2461|2489|2566|2390|2491|2486|2418|2385|2321|2323|2264|2291|2389|2392|2227|2301|2275|2187|2254|2250|2202|2205|2187|2217|2155|2220|2105|2123|2165|2206|2149|2031|2108|2141|2066|2100|2080|2025|2107|2075|2068|2019|2023|2010|2026|2005|1995|2090|1978|2045|1994|1989|1999|2039|1942|1972|1946|1990|2006|1964|1962|1893|1939|1906|1888|1908|1968|1846|1918|1979|2044|1925|1884|1859|1862|1947|1930|1825|1934|1891|1975|1830|1898|1895|1926|1973|1924|1986|1920|1797|1912|1986|1890|1924|1944|1915|1996|1900|1925|1945|2011|1882|1953|1952|1974|1929|1997|2053|1981|1993|1959|1880|1925|1946|2015|1947|1917|2052|2062|2071|2017|2089|1949|2030|2007|2083|2080|1949|2085|2111|2091|2151|2135|2137|2019|2129|2076|2149|2084|2188|2108|2188|2105|2178|2159|2197|2146|2174|2360|2281|2260|2259|2254|2297|2279|2338|2337|2365|2450|2368|2407|2407|2435|2379|2488|2492|2497|2525|2490|2566|2577|2565|2676|2712|2726|2674|2688|2799|2707|2738|2831|2956|2800|2961|2904|3059|2994|3157|3254|3316|3508|3710|3905|4319|4664|5116|5937|6536|7435|8678|9923|11935|13906|16928|19919|23808|28421|34066|40774|48059|57324|67501|80304|94264|109903|125204|137869|5850359</HistCounts>
</HistItem>
</Histograms>
<Metadata>
<MDI key="STATISTICS_MINIMUM">0</MDI>
<MDI key="STATISTICS_MAXIMUM">255</MDI>
<MDI key="STATISTICS_MEAN">207.40977040919</MDI>
<MDI key="STATISTICS_STDDEV">94.77357404896</MDI>
</Metadata>
</PAMRasterBand>
<PAMRasterBand band="3">
<Histograms>
<HistItem>
<HistMin>-0.5</HistMin>
<HistMax>255.5</HistMax>
<BucketCount>256</BucketCount>
<IncludeOutOfRange>1</IncludeOutOfRange>
<Approximate>0</Approximate>
<HistCounts>1056558|38085|35030|30635|26721|23196|19973|17614|15202|13621|12110|10633|9602|8622|7615|6827|6160|5785|5263|4850|4471|4102|3821|3520|3396|3346|3106|3054|2953|2870|2747|2735|2682|2717|2536|2555|2465|2514|2461|2489|2566|2390|2491|2486|2418|2385|2321|2323|2264|2291|2389|2392|2227|2301|2275|2187|2254|2250|2202|2205|2187|2217|2155|2220|2105|2123|2165|2206|2149|2031|2108|2141|2066|2100|2080|2025|2107|2075|2068|2019|2023|2010|2026|2005|1995|2090|1978|2045|1994|1989|1999|2039|1942|1972|1946|1990|2006|1964|1962|1893|1939|1906|1888|1908|1968|1846|1918|1979|2044|1925|1884|1859|1862|1947|1930|1825|1934|1891|1975|1830|1898|1895|1926|1973|1924|1986|1920|1797|1912|1986|1890|1924|1944|1915|1996|1900|1925|1945|2011|1882|1953|1952|1974|1929|1997|2053|1981|1993|1959|1880|1925|1946|2015|1947|1917|2052|2062|2071|2017|2089|1949|2030|2007|2083|2080|1949|2085|2111|2091|2151|2135|2137|2019|2129|2076|2149|2084|2188|2108|2188|2105|2178|2159|2197|2146|2174|2360|2281|2260|2259|2254|2297|2279|2338|2337|2365|2450|2368|2407|2407|2435|2379|2488|2492|2497|2525|2490|2566|2577|2565|2676|2712|2726|2674|2688|2799|2707|2738|2831|2956|2800|2961|2904|3059|2994|3157|3254|3316|3508|3710|3905|4319|4664|5116|5937|6536|7435|8678|9923|11935|13906|16928|19919|23808|28421|34066|40774|48059|57324|67501|80304|94264|109903|125204|137869|5850359</HistCounts>
</HistItem>
</Histograms>
<Metadata>
<MDI key="STATISTICS_MINIMUM">0</MDI>
<MDI key="STATISTICS_MAXIMUM">255</MDI>
<MDI key="STATISTICS_MEAN">207.40977040919</MDI>
<MDI key="STATISTICS_STDDEV">94.77357404896</MDI>
</Metadata>
</PAMRasterBand>
</PAMDataset>
......@@ -2,7 +2,9 @@
title: "Analyse varicelle"
author: "Marc Oudart"
date: "11/04/2020"
output: html_document
output:
pdf_document: default
html_document: default
---
......@@ -55,7 +57,7 @@ Tous les deux sont des entiers donc pas de transformation à faire.
Il faut que R comprenne que la 1ère colonne sont des dates.
Il nous faut donc la librairie `parsedate`.
```{r message=FALSE, warning=FALSE}
install.packages("parsedate")
library("parsedate")
```
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment