# Document Computationnel : Sujet 7 - Autour du SARS-CoV-2 (Covid-19)
- Dernière modification : *29/05/2020*
- Langage utilisé : *Python*

## Table des matières 

1. [Résumé / *abstract*](#résumé)
2. [Importation des données](#importation-des-données)
3. Formatage des données
4. Traitement des données
5. Visualisation
6. Conclusion

---

# Résumé
---

# Importation des données

## Sources :

* Graphique exemple de [South Chine Morning Post](https://www.scmp.com/coronavirus?src=homepage_covid_widget). Datant du 20 Mai 2020.
* Données brutes utilisées dans ce document : [time_series_covid19_confirmed_global.csv](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv)


On procède à un test afin de savoir si les données sont disponibles en local ou si l'ont doit utiliser l'URL.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
#import isoweek not needed here

data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

In [2]:
# Local data 
localData = "time_series_covid19_confirmed_global.csv"

In [3]:
import os
import urllib.request

if os.path.exists(localData):
    raw_data = pd.read_csv(localData)
    print("Local File Selected")
else :
    urllib.request.urlretrieve(data_url, data_data)
    raw_data = pd.read_csv(data_url)
    print("Online File Selected")
        
raw_data

Local File Selected


Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/19/20,5/20/20,5/21/20,5/22/20,5/23/20,5/24/20,5/25/20,5/26/20,5/27/20,5/28/20
0,,Afghanistan,33.000000,65.000000,0,0,0,0,0,0,...,7653,8145,8676,9216,9998,10582,11173,11831,12456,13036
1,,Albania,41.153300,20.168300,0,0,0,0,0,0,...,949,964,969,981,989,998,1004,1029,1050,1076
2,,Algeria,28.033900,1.659600,0,0,0,0,0,0,...,7377,7542,7728,7918,8113,8306,8503,8697,8857,8997
3,,Andorra,42.506300,1.521800,0,0,0,0,0,0,...,761,762,762,762,762,762,763,763,763,763
4,,Angola,-11.202700,17.873900,0,0,0,0,0,0,...,52,52,58,60,61,69,70,70,71,74
5,,Antigua and Barbuda,17.060800,-61.796400,0,0,0,0,0,0,...,25,25,25,25,25,25,25,25,25,25
6,,Argentina,-38.416100,-63.616700,0,0,0,0,0,0,...,8809,9283,9931,10649,11353,12076,12628,13228,13933,14702
7,,Armenia,40.069100,45.038200,0,0,0,0,0,0,...,5041,5271,5606,5928,6302,6661,7113,7402,7774,8216
8,Australian Capital Territory,Australia,-35.473500,149.012400,0,0,0,0,0,0,...,107,107,107,107,107,107,107,107,107,107
9,New South Wales,Australia,-33.868800,151.209300,0,0,0,0,3,4,...,3081,3082,3084,3086,3087,3090,3092,3089,3090,3092


Les données ci-dessus sont les données brutes provenant du fichier CSV de gauche à droite elles correspondent à :

| Column's Name  | Meaning                                                                        |
| ---------------|:------------------------------------------------------------------------------:|
| ID             | unique identity for the row                                                    |
| Province/State | gives data for a specific regions                                              |
| Country/Region | the country or the region to which the data are corresponding                  |
| Lat            | latitude                                                                       |
| Long           | longitude                                                                      |
| 1/22/20        | from here it gives the number citizens having the covid19                      |

Les données manquantes corresponde aux pays qui ne sont pas représenté à travers différentes provinces et états les composants.
Cependant, nous ne sommes pas dépendant de ces données, seul les données relatives au pays suivant nous intéresse. 

* Belgique 
* Chine - toutes les provinces sauf Hong-Kong (China),
* Hong Kong 
* France métropolitaine
* Allemagne
* Iran
* Italie
* Japon
* Corée du Sud
* Hollande
* Portugal 
* Espagne
* Royaume-Unis
* États-Unis

## Regroupement des données à inclure dans l'étude

Ici nous utilisons la méthode [*loc*](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) de pandas pour extraire des données brutes les lignes correspondantes aux pays cités ci-dessus.

Afin de ne pas rendre le *code* illisible le processus est divisé en de multiples étapes. (toutes ces étapes peuvent être regroupé en une expression logique.

In [4]:
# let's create a new variable to store our new data frame
# starting with Belgium
dataCountries = raw_data.loc[(raw_data['Country/Region'] == 'Belgium')]

dataCountries

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/19/20,5/20/20,5/21/20,5/22/20,5/23/20,5/24/20,5/25/20,5/26/20,5/27/20,5/28/20
23,,Belgium,50.8333,4.0,0,0,0,0,0,0,...,55791,55983,56235,56511,56810,57092,57342,57455,57592,57849


In [5]:
# now let's add to dataCountries the rest of the countries needed 
# Here with & Prince/State.isnull we are only including metropolitan France's row and not the specific regions from France detailed in the data.

dataCountries = dataCountries.append(raw_data.loc[(raw_data['Country/Region'] == 'France') & (raw_data['Province/State'].isnull())])

dataCountries

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/19/20,5/20/20,5/21/20,5/22/20,5/23/20,5/24/20,5/25/20,5/26/20,5/27/20,5/28/20
23,,Belgium,50.8333,4.0,0,0,0,0,0,0,...,55791,55983,56235,56511,56810,57092,57342,57455,57592,57849
116,,France,46.2276,2.2137,0,0,2,3,3,3,...,178428,179069,179306,179645,179964,179859,180166,179887,180044,183309


Les mêmes étapes sont utilisées pour le reste des pays manquants, sauf pour la Chine qui nécessite une opération spécial. (Voir ci-dessous)

In [6]:
countries_list= list(['China, Hong-Kong', 'Germany', 'Iran', 'Italy', 'Japan', 'Korea, South', 'Netherlands', 'Portugal', 'Spain', 'United Kingdom', 'US'])
#print(countries_list)

for country in countries_list : 
    dataCountries = dataCountries.append(raw_data.loc[(raw_data['Country/Region'] == country) & (raw_data['Province/State'].isnull())])

dataCountries

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/19/20,5/20/20,5/21/20,5/22/20,5/23/20,5/24/20,5/25/20,5/26/20,5/27/20,5/28/20
23,,Belgium,50.8333,4.0,0,0,0,0,0,0,...,55791,55983,56235,56511,56810,57092,57342,57455,57592,57849
116,,France,46.2276,2.2137,0,0,2,3,3,3,...,178428,179069,179306,179645,179964,179859,180166,179887,180044,183309
120,,Germany,51.0,9.0,0,0,0,0,0,1,...,177778,178473,179021,179710,179986,180328,180600,181200,181524,182196
133,,Iran,32.0,53.0,0,0,0,0,0,0,...,124603,126949,129341,131652,133521,135701,137724,139511,141591,143849
137,,Italy,43.0,12.0,0,0,0,0,0,0,...,226699,227364,228006,228658,229327,229858,230158,230555,231139,231732
139,,Japan,36.0,138.0,2,2,2,2,4,4,...,16367,16367,16424,16513,16536,16550,16581,16623,16651,16598
143,,"Korea, South",36.0,128.0,1,1,2,2,3,4,...,11110,11122,11142,11165,11190,11206,11225,11265,11344,11402
169,,Netherlands,52.1326,5.2913,0,0,0,0,0,0,...,44249,44447,44700,44888,45064,45236,45445,45578,45768,45950
184,,Portugal,39.3999,-8.2245,0,0,0,0,0,0,...,29432,29660,29912,30200,30471,30623,30788,31007,31292,31596
201,,Spain,40.0,-4.0,0,0,0,0,0,0,...,232037,232555,233037,234824,235290,235772,235400,236259,236259,237906


TODO explain

In [14]:
# For china the data have to be summed between region in order to get the results for the whole country.
dataChina = raw_data.loc[(raw_data['Country/Region'] == 'China')]

#print(dataChina)

#let's use df.sum() to sum rows 
col_list= list(dataChina)
col_list.remove("Province/State")
col_list.remove("Country/Region")
col_list.remove("Lat")
col_list.remove("Long")



for col in col_list:      
    dataChina.at['1', col] = dataChina[col].sum()


dataChina.at['1', "Country/Region"] = "China"

dataChina

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/19/20,5/20/20,5/21/20,5/22/20,5/23/20,5/24/20,5/25/20,5/26/20,5/27/20,5/28/20
49,Anhui,China,31.8257,117.2264,1.0,9.0,15.0,39.0,60.0,70.0,...,991.0,991.0,991.0,991.0,991.0,991.0,991.0,991.0,991.0,991.0
50,Beijing,China,40.1824,116.4142,14.0,22.0,36.0,41.0,68.0,80.0,...,593.0,593.0,593.0,593.0,593.0,593.0,593.0,593.0,593.0,593.0
51,Chongqing,China,30.0572,107.874,6.0,9.0,27.0,57.0,75.0,110.0,...,579.0,579.0,579.0,579.0,579.0,579.0,579.0,579.0,579.0,579.0
52,Fujian,China,26.0789,117.9874,1.0,5.0,10.0,18.0,35.0,59.0,...,356.0,356.0,356.0,356.0,356.0,356.0,357.0,357.0,358.0,358.0
53,Gansu,China,37.8099,101.0583,0.0,2.0,2.0,4.0,7.0,14.0,...,139.0,139.0,139.0,139.0,139.0,139.0,139.0,139.0,139.0,139.0
54,Guangdong,China,23.3417,113.4244,26.0,32.0,53.0,78.0,111.0,151.0,...,1590.0,1590.0,1590.0,1591.0,1592.0,1592.0,1592.0,1592.0,1592.0,1592.0
55,Guangxi,China,23.8298,108.7881,2.0,5.0,23.0,23.0,36.0,46.0,...,254.0,254.0,254.0,254.0,254.0,254.0,254.0,254.0,254.0,254.0
56,Guizhou,China,26.8154,106.8748,1.0,3.0,3.0,4.0,5.0,7.0,...,147.0,147.0,147.0,147.0,147.0,147.0,147.0,147.0,147.0,147.0
57,Hainan,China,19.1959,109.7453,4.0,5.0,8.0,19.0,22.0,33.0,...,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0
58,Hebei,China,39.549,116.1306,1.0,1.0,2.0,8.0,13.0,18.0,...,328.0,328.0,328.0,328.0,328.0,328.0,328.0,328.0,328.0,328.0
