{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Document Computationnel : Sujet 7 - Autour du SARS-CoV-2 (Covid-19)\n", "- Dernière modification : *01/06/2020*\n", "- Langage utilisé : *Python*\n", "\n", "## Table des matières \n", "\n", "1. [Résumé / *abstract*](#résumé)\n", "2. [Importation des données](#importation-des-données)\n", "3. [Formatage des données](#formatage-des-données)\n", "4. [Traitement des données](#traitement-des-données)\n", "5. [Elément complémentaire](#etude-complémentaire)\n", "6. [Conclusion](#conclusion)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Résumé\n", "\n", "Vous trouverez dans ce notebook le cheminement nécessaire pour représenter une figure semblable à celle présente sur le site du [South China morning post](https://www.scmp.com/coronavirus?src=homepage_covid_widget). Toutes les étapes sont décrites et commentées. \n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Importation des données\n", "\n", "## Sources :\n", "\n", "* Graphique exemple de [South Chine Morning Post](https://www.scmp.com/coronavirus?src=homepage_covid_widget). Datant du 20 Mai 2020.\n", "* Données brutes utilisées dans ce document : [time_series_covid19_confirmed_global.csv](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv)\n", "\n", "\n", "On procède à un test afin de savoir si les données sont disponibles en local ou si l'on doit utiliser l'URL d'origine." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "#import isoweek not needed here\n", "\n", "data_url = \"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Local data \n", "localData = \"time_series_covid19_confirmed_global.csv\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Local File Selected\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/201/26/201/27/20...5/19/205/20/205/21/205/22/205/23/205/24/205/25/205/26/205/27/205/28/20
0NaNAfghanistan33.00000065.000000000000...765381458676921699981058211173118311245613036
1NaNAlbania41.15330020.168300000000...9499649699819899981004102910501076
2NaNAlgeria28.0339001.659600000000...7377754277287918811383068503869788578997
3NaNAndorra42.5063001.521800000000...761762762762762762763763763763
4NaNAngola-11.20270017.873900000000...52525860616970707174
5NaNAntigua and Barbuda17.060800-61.796400000000...25252525252525252525
6NaNArgentina-38.416100-63.616700000000...88099283993110649113531207612628132281393314702
7NaNArmenia40.06910045.038200000000...5041527156065928630266617113740277748216
8Australian Capital TerritoryAustralia-35.473500149.012400000000...107107107107107107107107107107
9New South WalesAustralia-33.868800151.209300000034...3081308230843086308730903092308930903092
10Northern TerritoryAustralia-12.463400130.845600000000...29292929292929292929
11QueenslandAustralia-28.016700153.400000000000...1058105810581060106110561057105810581058
12South AustraliaAustralia-34.928500138.600700000000...439439439439439439439440440440
13TasmaniaAustralia-41.454500145.970700000000...228228228228228228228228228228
14VictoriaAustralia-37.813600144.963100000011...1573158115931593160316051610161816281634
15Western AustraliaAustralia-31.950500115.860500000000...557557557557560560564570570577
16NaNAustria47.51620014.550100000000...16321163531640416436164861650316539165571659116628
17NaNAzerbaijan40.14310047.576900000000...3518363137493855398241224271440345684759
18NaNBahamas25.034300-77.396300000000...96979797100100100100100101
19NaNBahrain26.02750050.550000000000...75327888817484148802913891719366969210052
20NaNBangladesh23.68500090.356300000000...25121267382851130205320783361035585367513829240321
21NaNBarbados13.193900-59.543200000000...90909090929292929292
22NaNBelarus53.70980027.953400000000...31508324263337134303352443619837144380593895639858
23NaNBelgium50.8333004.000000000000...55791559835623556511568105709257342574555759257849
24NaNBenin9.3077002.315800000000...130130135135135191191208210210
25NaNBhutan27.51420090.433600000000...21212121242427272831
26NaNBolivia-16.290200-63.588700000000...4481491951875579591562636660713677688387
27NaNBosnia and Herzegovina43.91590017.679100000000...2321233823502372239124012406241624352462
28NaNBrazil-14.235000-51.925300000000...271885291579310087330890347398363211374898391222411821438238
29NaNBrunei4.535300114.727700000000...141141141141141141141141141141
..................................................................
236NaNTimor-Leste-8.874217125.727539000000...24242424242424242424
237NaNBelize13.193900-59.543200000000...18181818181818181818
238NaNLaos19.856270102.495496000000...19191919191919191919
239NaNLibya26.33510017.228331000000...686971727575757799105
240NaNWest Bank and Gaza31.95220035.233200000000...391398423423423423423429434446
241NaNGuinea-Bissau11.803700-15.180400000000...1038108911091114111411141178117811951195
242NaNMali17.570692-3.996166000000...901931947969101510301059107711161194
243NaNSaint Kitts and Nevis17.357822-62.782998000000...15151515151515151515
244Northwest TerritoriesCanada64.825500-124.845700000000...5555555555
245YukonCanada64.282300-135.000000000000...11111111111111111111
246NaNKosovo42.60263620.902977000000...98998910031004102510321038103810471048
247NaNBurma21.91620095.956000000000...193199199199201201203206206206
248AnguillaUnited Kingdom18.220600-63.068600000000...3333333333
249British Virgin IslandsUnited Kingdom18.420700-64.640000000000...8888888888
250Turks and Caicos IslandsUnited Kingdom21.694000-71.797900000000...12121212121212121212
251NaNMS Zaandam0.0000000.000000000000...9999999999
252NaNBotswana-22.32850024.684900000000...25252930303535353535
253NaNBurundi-3.37310029.918900000000...42424242424242424242
254NaNSierra Leone8.460555-11.779889000000...534570585606621707735754782812
255Bonaire, Sint Eustatius and SabaNetherlands12.178400-68.238500000000...6666666666
256NaNMalawi-13.25430834.301525000000...707172828283101101101203
257Falkland Islands (Malvinas)United Kingdom-51.796300-59.523600000000...13131313131313131313
258Saint Pierre and MiquelonFrance46.885200-56.315900000000...1111111111
259NaNSouth Sudan6.87700031.307000000000...290290481563655655806806994994
260NaNWestern Sahara24.215500-12.885800000000...6666699999
261NaNSao Tome and Principe0.1863606.613081000000...251251251251251251299441443458
262NaNYemen15.55272748.516388000000...167184197209212222233249256278
263NaNComoros-11.64550043.333300000000...11343478788787878787
264NaNTajikistan38.86103471.276093000000...1936214023502551273829293100326634243563
265NaNLesotho-29.60998828.233608000000...1112222222
\n", "

266 rows × 132 columns

\n", "
" ], "text/plain": [ " Province/State Country/Region Lat \\\n", "0 NaN Afghanistan 33.000000 \n", "1 NaN Albania 41.153300 \n", "2 NaN Algeria 28.033900 \n", "3 NaN Andorra 42.506300 \n", "4 NaN Angola -11.202700 \n", "5 NaN Antigua and Barbuda 17.060800 \n", "6 NaN Argentina -38.416100 \n", "7 NaN Armenia 40.069100 \n", "8 Australian Capital Territory Australia -35.473500 \n", "9 New South Wales Australia -33.868800 \n", "10 Northern Territory Australia -12.463400 \n", "11 Queensland Australia -28.016700 \n", "12 South Australia Australia -34.928500 \n", "13 Tasmania Australia -41.454500 \n", "14 Victoria Australia -37.813600 \n", "15 Western Australia Australia -31.950500 \n", "16 NaN Austria 47.516200 \n", "17 NaN Azerbaijan 40.143100 \n", "18 NaN Bahamas 25.034300 \n", "19 NaN Bahrain 26.027500 \n", "20 NaN Bangladesh 23.685000 \n", "21 NaN Barbados 13.193900 \n", "22 NaN Belarus 53.709800 \n", "23 NaN Belgium 50.833300 \n", "24 NaN Benin 9.307700 \n", "25 NaN Bhutan 27.514200 \n", "26 NaN Bolivia -16.290200 \n", "27 NaN Bosnia and Herzegovina 43.915900 \n", "28 NaN Brazil -14.235000 \n", "29 NaN Brunei 4.535300 \n", ".. ... ... ... \n", "236 NaN Timor-Leste -8.874217 \n", "237 NaN Belize 13.193900 \n", "238 NaN Laos 19.856270 \n", "239 NaN Libya 26.335100 \n", "240 NaN West Bank and Gaza 31.952200 \n", "241 NaN Guinea-Bissau 11.803700 \n", "242 NaN Mali 17.570692 \n", "243 NaN Saint Kitts and Nevis 17.357822 \n", "244 Northwest Territories Canada 64.825500 \n", "245 Yukon Canada 64.282300 \n", "246 NaN Kosovo 42.602636 \n", "247 NaN Burma 21.916200 \n", "248 Anguilla United Kingdom 18.220600 \n", "249 British Virgin Islands United Kingdom 18.420700 \n", "250 Turks and Caicos Islands United Kingdom 21.694000 \n", "251 NaN MS Zaandam 0.000000 \n", "252 NaN Botswana -22.328500 \n", "253 NaN Burundi -3.373100 \n", "254 NaN Sierra Leone 8.460555 \n", "255 Bonaire, Sint Eustatius and Saba Netherlands 12.178400 \n", "256 NaN Malawi -13.254308 \n", "257 Falkland Islands (Malvinas) United Kingdom -51.796300 \n", "258 Saint Pierre and Miquelon France 46.885200 \n", "259 NaN South Sudan 6.877000 \n", "260 NaN Western Sahara 24.215500 \n", "261 NaN Sao Tome and Principe 0.186360 \n", "262 NaN Yemen 15.552727 \n", "263 NaN Comoros -11.645500 \n", "264 NaN Tajikistan 38.861034 \n", "265 NaN Lesotho -29.609988 \n", "\n", " Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 \\\n", "0 65.000000 0 0 0 0 0 0 \n", "1 20.168300 0 0 0 0 0 0 \n", "2 1.659600 0 0 0 0 0 0 \n", "3 1.521800 0 0 0 0 0 0 \n", "4 17.873900 0 0 0 0 0 0 \n", "5 -61.796400 0 0 0 0 0 0 \n", "6 -63.616700 0 0 0 0 0 0 \n", "7 45.038200 0 0 0 0 0 0 \n", "8 149.012400 0 0 0 0 0 0 \n", "9 151.209300 0 0 0 0 3 4 \n", "10 130.845600 0 0 0 0 0 0 \n", "11 153.400000 0 0 0 0 0 0 \n", "12 138.600700 0 0 0 0 0 0 \n", "13 145.970700 0 0 0 0 0 0 \n", "14 144.963100 0 0 0 0 1 1 \n", "15 115.860500 0 0 0 0 0 0 \n", "16 14.550100 0 0 0 0 0 0 \n", "17 47.576900 0 0 0 0 0 0 \n", "18 -77.396300 0 0 0 0 0 0 \n", "19 50.550000 0 0 0 0 0 0 \n", "20 90.356300 0 0 0 0 0 0 \n", "21 -59.543200 0 0 0 0 0 0 \n", "22 27.953400 0 0 0 0 0 0 \n", "23 4.000000 0 0 0 0 0 0 \n", "24 2.315800 0 0 0 0 0 0 \n", "25 90.433600 0 0 0 0 0 0 \n", "26 -63.588700 0 0 0 0 0 0 \n", "27 17.679100 0 0 0 0 0 0 \n", "28 -51.925300 0 0 0 0 0 0 \n", "29 114.727700 0 0 0 0 0 0 \n", ".. ... ... ... ... ... ... ... \n", "236 125.727539 0 0 0 0 0 0 \n", "237 -59.543200 0 0 0 0 0 0 \n", "238 102.495496 0 0 0 0 0 0 \n", "239 17.228331 0 0 0 0 0 0 \n", "240 35.233200 0 0 0 0 0 0 \n", "241 -15.180400 0 0 0 0 0 0 \n", "242 -3.996166 0 0 0 0 0 0 \n", "243 -62.782998 0 0 0 0 0 0 \n", "244 -124.845700 0 0 0 0 0 0 \n", "245 -135.000000 0 0 0 0 0 0 \n", "246 20.902977 0 0 0 0 0 0 \n", "247 95.956000 0 0 0 0 0 0 \n", "248 -63.068600 0 0 0 0 0 0 \n", "249 -64.640000 0 0 0 0 0 0 \n", "250 -71.797900 0 0 0 0 0 0 \n", "251 0.000000 0 0 0 0 0 0 \n", "252 24.684900 0 0 0 0 0 0 \n", "253 29.918900 0 0 0 0 0 0 \n", "254 -11.779889 0 0 0 0 0 0 \n", "255 -68.238500 0 0 0 0 0 0 \n", "256 34.301525 0 0 0 0 0 0 \n", "257 -59.523600 0 0 0 0 0 0 \n", "258 -56.315900 0 0 0 0 0 0 \n", "259 31.307000 0 0 0 0 0 0 \n", "260 -12.885800 0 0 0 0 0 0 \n", "261 6.613081 0 0 0 0 0 0 \n", "262 48.516388 0 0 0 0 0 0 \n", "263 43.333300 0 0 0 0 0 0 \n", "264 71.276093 0 0 0 0 0 0 \n", "265 28.233608 0 0 0 0 0 0 \n", "\n", " ... 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 \\\n", "0 ... 7653 8145 8676 9216 9998 10582 11173 \n", "1 ... 949 964 969 981 989 998 1004 \n", "2 ... 7377 7542 7728 7918 8113 8306 8503 \n", "3 ... 761 762 762 762 762 762 763 \n", "4 ... 52 52 58 60 61 69 70 \n", "5 ... 25 25 25 25 25 25 25 \n", "6 ... 8809 9283 9931 10649 11353 12076 12628 \n", "7 ... 5041 5271 5606 5928 6302 6661 7113 \n", "8 ... 107 107 107 107 107 107 107 \n", "9 ... 3081 3082 3084 3086 3087 3090 3092 \n", "10 ... 29 29 29 29 29 29 29 \n", "11 ... 1058 1058 1058 1060 1061 1056 1057 \n", "12 ... 439 439 439 439 439 439 439 \n", "13 ... 228 228 228 228 228 228 228 \n", "14 ... 1573 1581 1593 1593 1603 1605 1610 \n", "15 ... 557 557 557 557 560 560 564 \n", "16 ... 16321 16353 16404 16436 16486 16503 16539 \n", "17 ... 3518 3631 3749 3855 3982 4122 4271 \n", "18 ... 96 97 97 97 100 100 100 \n", "19 ... 7532 7888 8174 8414 8802 9138 9171 \n", "20 ... 25121 26738 28511 30205 32078 33610 35585 \n", "21 ... 90 90 90 90 92 92 92 \n", "22 ... 31508 32426 33371 34303 35244 36198 37144 \n", "23 ... 55791 55983 56235 56511 56810 57092 57342 \n", "24 ... 130 130 135 135 135 191 191 \n", "25 ... 21 21 21 21 24 24 27 \n", "26 ... 4481 4919 5187 5579 5915 6263 6660 \n", "27 ... 2321 2338 2350 2372 2391 2401 2406 \n", "28 ... 271885 291579 310087 330890 347398 363211 374898 \n", "29 ... 141 141 141 141 141 141 141 \n", ".. ... ... ... ... ... ... ... ... \n", "236 ... 24 24 24 24 24 24 24 \n", "237 ... 18 18 18 18 18 18 18 \n", "238 ... 19 19 19 19 19 19 19 \n", "239 ... 68 69 71 72 75 75 75 \n", "240 ... 391 398 423 423 423 423 423 \n", "241 ... 1038 1089 1109 1114 1114 1114 1178 \n", "242 ... 901 931 947 969 1015 1030 1059 \n", "243 ... 15 15 15 15 15 15 15 \n", "244 ... 5 5 5 5 5 5 5 \n", "245 ... 11 11 11 11 11 11 11 \n", "246 ... 989 989 1003 1004 1025 1032 1038 \n", "247 ... 193 199 199 199 201 201 203 \n", "248 ... 3 3 3 3 3 3 3 \n", "249 ... 8 8 8 8 8 8 8 \n", "250 ... 12 12 12 12 12 12 12 \n", "251 ... 9 9 9 9 9 9 9 \n", "252 ... 25 25 29 30 30 35 35 \n", "253 ... 42 42 42 42 42 42 42 \n", "254 ... 534 570 585 606 621 707 735 \n", "255 ... 6 6 6 6 6 6 6 \n", "256 ... 70 71 72 82 82 83 101 \n", "257 ... 13 13 13 13 13 13 13 \n", "258 ... 1 1 1 1 1 1 1 \n", "259 ... 290 290 481 563 655 655 806 \n", "260 ... 6 6 6 6 6 9 9 \n", "261 ... 251 251 251 251 251 251 299 \n", "262 ... 167 184 197 209 212 222 233 \n", "263 ... 11 34 34 78 78 87 87 \n", "264 ... 1936 2140 2350 2551 2738 2929 3100 \n", "265 ... 1 1 1 2 2 2 2 \n", "\n", " 5/26/20 5/27/20 5/28/20 \n", "0 11831 12456 13036 \n", "1 1029 1050 1076 \n", "2 8697 8857 8997 \n", "3 763 763 763 \n", "4 70 71 74 \n", "5 25 25 25 \n", "6 13228 13933 14702 \n", "7 7402 7774 8216 \n", "8 107 107 107 \n", "9 3089 3090 3092 \n", "10 29 29 29 \n", "11 1058 1058 1058 \n", "12 440 440 440 \n", "13 228 228 228 \n", "14 1618 1628 1634 \n", "15 570 570 577 \n", "16 16557 16591 16628 \n", "17 4403 4568 4759 \n", "18 100 100 101 \n", "19 9366 9692 10052 \n", "20 36751 38292 40321 \n", "21 92 92 92 \n", "22 38059 38956 39858 \n", "23 57455 57592 57849 \n", "24 208 210 210 \n", "25 27 28 31 \n", "26 7136 7768 8387 \n", "27 2416 2435 2462 \n", "28 391222 411821 438238 \n", "29 141 141 141 \n", ".. ... ... ... \n", "236 24 24 24 \n", "237 18 18 18 \n", "238 19 19 19 \n", "239 77 99 105 \n", "240 429 434 446 \n", "241 1178 1195 1195 \n", "242 1077 1116 1194 \n", "243 15 15 15 \n", "244 5 5 5 \n", "245 11 11 11 \n", "246 1038 1047 1048 \n", "247 206 206 206 \n", "248 3 3 3 \n", "249 8 8 8 \n", "250 12 12 12 \n", "251 9 9 9 \n", "252 35 35 35 \n", "253 42 42 42 \n", "254 754 782 812 \n", "255 6 6 6 \n", "256 101 101 203 \n", "257 13 13 13 \n", "258 1 1 1 \n", "259 806 994 994 \n", "260 9 9 9 \n", "261 441 443 458 \n", "262 249 256 278 \n", "263 87 87 87 \n", "264 3266 3424 3563 \n", "265 2 2 2 \n", "\n", "[266 rows x 132 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "import urllib.request\n", "\n", "if os.path.exists(localData):\n", " raw_data = pd.read_csv(localData)\n", " print(\"Local File Selected\")\n", "else :\n", " urllib.request.urlretrieve(data_url, data_data)\n", " raw_data = pd.read_csv(data_url)\n", " print(\"Online File Selected\")\n", " \n", "raw_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données ci-dessus sont les données brutes provenant du fichier CSV, de gauche à droite elles correspondent à :\n", "\n", "| Column's Name | Meaning |\n", "| ---------------|:------------------------------------------------------------------------------:|\n", "| ID | unique identity for the row |\n", "| Province/State | gives data for a specific regions |\n", "| Country/Region | the country or the region to which the data are corresponding |\n", "| Lat | latitude |\n", "| Long | longitude |\n", "| 1/22/20 | from here it gives the number citizens having the covid19 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les données manquantes correspondent aux pays qui ne sont pas représentés à travers différentes provinces et états les composants.\n", "Cependant, nous ne sommes pas dépendant de ces données, seul les données relatives aux pays suivants nous intéressent. \n", "\n", "* Belgique \n", "* Chine - toutes les provinces sauf Hong-Kong (China),\n", "* Hong Kong \n", "* France métropolitaine\n", "* Allemagne\n", "* Iran\n", "* Italie\n", "* Japon\n", "* Corée du Sud\n", "* Hollande\n", "* Portugal \n", "* Espagne\n", "* Royaume-Unis\n", "* États-Unis\n", "\n", "---\n", "\n", "# Formatage des données\n", "\n", "## Regroupement des données à inclure dans l'étude\n", "\n", "Ici nous utilisons la méthode [*loc*](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) de pandas pour extraire des données brutes, les lignes correspondantes aux pays cités ci-dessus.\n", "\n", "Afin de ne pas rendre le *code* illisible le processus est divisé en de multiples étapes. (toutes ces étapes peuvent être regroupées)\n", "\n", "1. Exemple d'ajout de données liées à un pays;\n", "2. Ajout de tous les autres pays excepté la Chine;\n", "3. Ajout de la Chine en cumulant chacunes des ses provinces, sans Hong-Kong." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/201/26/201/27/20...5/19/205/20/205/21/205/22/205/23/205/24/205/25/205/26/205/27/205/28/20
23NaNBelgium50.83334.0000000...55791559835623556511568105709257342574555759257849
\n", "

1 rows × 132 columns

\n", "
" ], "text/plain": [ " Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 \\\n", "23 NaN Belgium 50.8333 4.0 0 0 0 \n", "\n", " 1/25/20 1/26/20 1/27/20 ... 5/19/20 5/20/20 5/21/20 5/22/20 \\\n", "23 0 0 0 ... 55791 55983 56235 56511 \n", "\n", " 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 \n", "23 56810 57092 57342 57455 57592 57849 \n", "\n", "[1 rows x 132 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's create a new variable to store our new data frame\n", "# starting with Belgium\n", "dataCountries = raw_data.loc[(raw_data['Country/Region'] == 'Belgium')]\n", "\n", "dataCountries" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/201/26/201/27/20...5/19/205/20/205/21/205/22/205/23/205/24/205/25/205/26/205/27/205/28/20
23NaNBelgium50.83334.0000000000...55791559835623556511568105709257342574555759257849
116NaNFrance46.22762.2137002333...178428179069179306179645179964179859180166179887180044183309
\n", "

2 rows × 132 columns

\n", "
" ], "text/plain": [ " Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 \\\n", "23 NaN Belgium 50.8333 4.0000 0 0 0 \n", "116 NaN France 46.2276 2.2137 0 0 2 \n", "\n", " 1/25/20 1/26/20 1/27/20 ... 5/19/20 5/20/20 5/21/20 5/22/20 \\\n", "23 0 0 0 ... 55791 55983 56235 56511 \n", "116 3 3 3 ... 178428 179069 179306 179645 \n", "\n", " 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 \n", "23 56810 57092 57342 57455 57592 57849 \n", "116 179964 179859 180166 179887 180044 183309 \n", "\n", "[2 rows x 132 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# now let's add to dataCountries the rest of the countries needed \n", "# Here with & Province/State.isnull we are only including metropolitan France's row and not the specific regions from France detailed in the data.\n", "\n", "dataCountries = dataCountries.append(raw_data.loc[(raw_data['Country/Region'] == 'France') & (raw_data['Province/State'].isnull())])\n", "\n", "dataCountries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les mêmes étapes sont utilisées pour le reste des pays manquants, sauf pour la Chine qui nécessite une opération spécial. (Voir ci-dessous)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [], "source": [ "countries_list= list(['Germany', 'Iran', 'Italy', 'Japan', 'Korea, South', 'Netherlands', 'Portugal', 'Spain', 'United Kingdom', 'US'])\n", "#print(countries_list)\n", "\n", "for country in countries_list : \n", " dataCountries = dataCountries.append(raw_data.loc[(raw_data['Country/Region'] == country) & (raw_data['Province/State'].isnull())])\n", "\n", "# Manualy adding Hong-Kong \n", "dataCountries = dataCountries.append(raw_data.loc[(raw_data['Country/Region'] == 'China') & (raw_data['Province/State'] == 'Hong Kong')]) \n", "\n", "#Uncomment to see the dataframe\n", "#dataCountries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour éviter que deux lignes correspondent au même pays, la Chine. On renome *Hong Kong, China* en *Hong Kong, Hong Kong*. Ainsi, nous pourrons ajouter toutes les provinces de Chine dans une même ligne nommée *China*. Nous utilisons donc la méthode [replace](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html) pour remplacer le nom du pays." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/201/26/201/27/20...5/19/205/20/205/21/205/22/205/23/205/24/205/25/205/26/205/27/205/28/20
23NaNBelgium50.83334.0000000000...55791559835623556511568105709257342574555759257849
116NaNFrance46.22762.2137002333...178428179069179306179645179964179859180166179887180044183309
120NaNGermany51.00009.0000000001...177778178473179021179710179986180328180600181200181524182196
133NaNIran32.000053.0000000000...124603126949129341131652133521135701137724139511141591143849
137NaNItaly43.000012.0000000000...226699227364228006228658229327229858230158230555231139231732
139NaNJapan36.0000138.0000222244...16367163671642416513165361655016581166231665116598
143NaNKorea, South36.0000128.0000112234...11110111221114211165111901120611225112651134411402
169NaNNetherlands52.13265.2913000000...44249444474470044888450644523645445455784576845950
184NaNPortugal39.3999-8.2245000000...29432296602991230200304713062330788310073129231596
201NaNSpain40.0000-4.0000000000...232037232555233037234824235290235772235400236259236259237906
223NaNUnited Kingdom55.3781-3.4360000000...248818248293250908254195257154259559261184265227267240269127
225NaNUS37.0902-95.7129112255...1528568155185315771471600937162261216432461662302168091316991761721753
61Hong KongHong Kong22.3000114.2000022588...1055105510551065106510651065106510661066
\n", "

13 rows × 132 columns

\n", "
" ], "text/plain": [ " Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n", "23 NaN Belgium 50.8333 4.0000 0 0 \n", "116 NaN France 46.2276 2.2137 0 0 \n", "120 NaN Germany 51.0000 9.0000 0 0 \n", "133 NaN Iran 32.0000 53.0000 0 0 \n", "137 NaN Italy 43.0000 12.0000 0 0 \n", "139 NaN Japan 36.0000 138.0000 2 2 \n", "143 NaN Korea, South 36.0000 128.0000 1 1 \n", "169 NaN Netherlands 52.1326 5.2913 0 0 \n", "184 NaN Portugal 39.3999 -8.2245 0 0 \n", "201 NaN Spain 40.0000 -4.0000 0 0 \n", "223 NaN United Kingdom 55.3781 -3.4360 0 0 \n", "225 NaN US 37.0902 -95.7129 1 1 \n", "61 Hong Kong Hong Kong 22.3000 114.2000 0 2 \n", "\n", " 1/24/20 1/25/20 1/26/20 1/27/20 ... 5/19/20 5/20/20 5/21/20 \\\n", "23 0 0 0 0 ... 55791 55983 56235 \n", "116 2 3 3 3 ... 178428 179069 179306 \n", "120 0 0 0 1 ... 177778 178473 179021 \n", "133 0 0 0 0 ... 124603 126949 129341 \n", "137 0 0 0 0 ... 226699 227364 228006 \n", "139 2 2 4 4 ... 16367 16367 16424 \n", "143 2 2 3 4 ... 11110 11122 11142 \n", "169 0 0 0 0 ... 44249 44447 44700 \n", "184 0 0 0 0 ... 29432 29660 29912 \n", "201 0 0 0 0 ... 232037 232555 233037 \n", "223 0 0 0 0 ... 248818 248293 250908 \n", "225 2 2 5 5 ... 1528568 1551853 1577147 \n", "61 2 5 8 8 ... 1055 1055 1055 \n", "\n", " 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 \n", "23 56511 56810 57092 57342 57455 57592 57849 \n", "116 179645 179964 179859 180166 179887 180044 183309 \n", "120 179710 179986 180328 180600 181200 181524 182196 \n", "133 131652 133521 135701 137724 139511 141591 143849 \n", "137 228658 229327 229858 230158 230555 231139 231732 \n", "139 16513 16536 16550 16581 16623 16651 16598 \n", "143 11165 11190 11206 11225 11265 11344 11402 \n", "169 44888 45064 45236 45445 45578 45768 45950 \n", "184 30200 30471 30623 30788 31007 31292 31596 \n", "201 234824 235290 235772 235400 236259 236259 237906 \n", "223 254195 257154 259559 261184 265227 267240 269127 \n", "225 1600937 1622612 1643246 1662302 1680913 1699176 1721753 \n", "61 1065 1065 1065 1065 1065 1066 1066 \n", "\n", "[13 rows x 132 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "dataCountries[\"Country/Region\"].replace({\"China\": \"Hong Kong\"}, inplace=True)\n", "\n", "dataCountries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La chine est composée de plusieurs provinces. Pour étudier l'ensemble de la Chine moins Hong-kong (voir consigne) nous additionnons le nombre de cas par jour dans une nouvelle ligne nommée China avec l'index 1 car non utilisé (orignellement utilisé par l'Afghanistan). Pour se faire nous utilisons les méthodes [at](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html) et [sum](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html).\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py:2035: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " self.loc[index, col] = value\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/201/26/201/27/20...5/19/205/20/205/21/205/22/205/23/205/24/205/25/205/26/205/27/205/28/20
23NaNBelgium50.83334.00000.00.00.00.00.00.0...55791.055983.056235.056511.056810.057092.057342.057455.057592.057849.0
116NaNFrance46.22762.21370.00.02.03.03.03.0...178428.0179069.0179306.0179645.0179964.0179859.0180166.0179887.0180044.0183309.0
120NaNGermany51.00009.00000.00.00.00.00.01.0...177778.0178473.0179021.0179710.0179986.0180328.0180600.0181200.0181524.0182196.0
133NaNIran32.000053.00000.00.00.00.00.00.0...124603.0126949.0129341.0131652.0133521.0135701.0137724.0139511.0141591.0143849.0
137NaNItaly43.000012.00000.00.00.00.00.00.0...226699.0227364.0228006.0228658.0229327.0229858.0230158.0230555.0231139.0231732.0
139NaNJapan36.0000138.00002.02.02.02.04.04.0...16367.016367.016424.016513.016536.016550.016581.016623.016651.016598.0
143NaNKorea, South36.0000128.00001.01.02.02.03.04.0...11110.011122.011142.011165.011190.011206.011225.011265.011344.011402.0
169NaNNetherlands52.13265.29130.00.00.00.00.00.0...44249.044447.044700.044888.045064.045236.045445.045578.045768.045950.0
184NaNPortugal39.3999-8.22450.00.00.00.00.00.0...29432.029660.029912.030200.030471.030623.030788.031007.031292.031596.0
201NaNSpain40.0000-4.00000.00.00.00.00.00.0...232037.0232555.0233037.0234824.0235290.0235772.0235400.0236259.0236259.0237906.0
223NaNUnited Kingdom55.3781-3.43600.00.00.00.00.00.0...248818.0248293.0250908.0254195.0257154.0259559.0261184.0265227.0267240.0269127.0
225NaNUS37.0902-95.71291.01.02.02.05.05.0...1528568.01551853.01577147.01600937.01622612.01643246.01662302.01680913.01699176.01721753.0
61Hong KongHong Kong22.3000114.20000.02.02.05.08.08.0...1055.01055.01055.01065.01065.01065.01065.01065.01066.01066.0
1NaNChinaNaNNaN548.0641.0918.01401.02067.02869.0...83008.083008.083008.083016.083019.083030.083037.083038.083040.083040.0
\n", "

14 rows × 132 columns

\n", "
" ], "text/plain": [ " Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n", "23 NaN Belgium 50.8333 4.0000 0.0 0.0 \n", "116 NaN France 46.2276 2.2137 0.0 0.0 \n", "120 NaN Germany 51.0000 9.0000 0.0 0.0 \n", "133 NaN Iran 32.0000 53.0000 0.0 0.0 \n", "137 NaN Italy 43.0000 12.0000 0.0 0.0 \n", "139 NaN Japan 36.0000 138.0000 2.0 2.0 \n", "143 NaN Korea, South 36.0000 128.0000 1.0 1.0 \n", "169 NaN Netherlands 52.1326 5.2913 0.0 0.0 \n", "184 NaN Portugal 39.3999 -8.2245 0.0 0.0 \n", "201 NaN Spain 40.0000 -4.0000 0.0 0.0 \n", "223 NaN United Kingdom 55.3781 -3.4360 0.0 0.0 \n", "225 NaN US 37.0902 -95.7129 1.0 1.0 \n", "61 Hong Kong Hong Kong 22.3000 114.2000 0.0 2.0 \n", "1 NaN China NaN NaN 548.0 641.0 \n", "\n", " 1/24/20 1/25/20 1/26/20 1/27/20 ... 5/19/20 5/20/20 \\\n", "23 0.0 0.0 0.0 0.0 ... 55791.0 55983.0 \n", "116 2.0 3.0 3.0 3.0 ... 178428.0 179069.0 \n", "120 0.0 0.0 0.0 1.0 ... 177778.0 178473.0 \n", "133 0.0 0.0 0.0 0.0 ... 124603.0 126949.0 \n", "137 0.0 0.0 0.0 0.0 ... 226699.0 227364.0 \n", "139 2.0 2.0 4.0 4.0 ... 16367.0 16367.0 \n", "143 2.0 2.0 3.0 4.0 ... 11110.0 11122.0 \n", "169 0.0 0.0 0.0 0.0 ... 44249.0 44447.0 \n", "184 0.0 0.0 0.0 0.0 ... 29432.0 29660.0 \n", "201 0.0 0.0 0.0 0.0 ... 232037.0 232555.0 \n", "223 0.0 0.0 0.0 0.0 ... 248818.0 248293.0 \n", "225 2.0 2.0 5.0 5.0 ... 1528568.0 1551853.0 \n", "61 2.0 5.0 8.0 8.0 ... 1055.0 1055.0 \n", "1 918.0 1401.0 2067.0 2869.0 ... 83008.0 83008.0 \n", "\n", " 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 \\\n", "23 56235.0 56511.0 56810.0 57092.0 57342.0 57455.0 \n", "116 179306.0 179645.0 179964.0 179859.0 180166.0 179887.0 \n", "120 179021.0 179710.0 179986.0 180328.0 180600.0 181200.0 \n", "133 129341.0 131652.0 133521.0 135701.0 137724.0 139511.0 \n", "137 228006.0 228658.0 229327.0 229858.0 230158.0 230555.0 \n", "139 16424.0 16513.0 16536.0 16550.0 16581.0 16623.0 \n", "143 11142.0 11165.0 11190.0 11206.0 11225.0 11265.0 \n", "169 44700.0 44888.0 45064.0 45236.0 45445.0 45578.0 \n", "184 29912.0 30200.0 30471.0 30623.0 30788.0 31007.0 \n", "201 233037.0 234824.0 235290.0 235772.0 235400.0 236259.0 \n", "223 250908.0 254195.0 257154.0 259559.0 261184.0 265227.0 \n", "225 1577147.0 1600937.0 1622612.0 1643246.0 1662302.0 1680913.0 \n", "61 1055.0 1065.0 1065.0 1065.0 1065.0 1065.0 \n", "1 83008.0 83016.0 83019.0 83030.0 83037.0 83038.0 \n", "\n", " 5/27/20 5/28/20 \n", "23 57592.0 57849.0 \n", "116 180044.0 183309.0 \n", "120 181524.0 182196.0 \n", "133 141591.0 143849.0 \n", "137 231139.0 231732.0 \n", "139 16651.0 16598.0 \n", "143 11344.0 11402.0 \n", "169 45768.0 45950.0 \n", "184 31292.0 31596.0 \n", "201 236259.0 237906.0 \n", "223 267240.0 269127.0 \n", "225 1699176.0 1721753.0 \n", "61 1066.0 1066.0 \n", "1 83040.0 83040.0 \n", "\n", "[14 rows x 132 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For china the data have to be summed between region in order to get the results for the whole country.\n", "dataChina = raw_data.loc[((raw_data['Country/Region'] == 'China') & (raw_data['Province/State'] != 'Hong Kong' ))]\n", "\n", "#print(dataChina)\n", "\n", "#We want to sum per date and not the regions or the latitude so we remove them from our temporary list of column.\n", "col_list= list(dataChina)\n", "col_list.remove(\"Province/State\")\n", "col_list.remove(\"Country/Region\")\n", "col_list.remove(\"Lat\")\n", "col_list.remove(\"Long\")\n", "\n", "\n", "#let's use df.sum() to sum rows \n", "for col in col_list: \n", " dataChina.at['1', col] = dataChina[col].sum()\n", "\n", "#Rename the Country in the column we have just created above.\n", "dataChina.at['1', \"Country/Region\"] = \"China\"\n", "\n", "dataChina\n", "#Now add the data to Data Countries\n", "dataCountries= dataCountries.append(dataChina.loc[(dataChina['Country/Region'] == 'China') & (dataChina['Province/State'].isnull())])\n", "\n", "dataCountries\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nous avons donc un dataFrame regroupant l'ensemble des données necéssaire nous pouvons encore supprimer les données que nous n'utiliserons pas telles que les provinces et régions ou la latitude et la longitude. Nous utilisons la méthode [drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Country/Region1/22/201/23/201/24/201/25/201/26/201/27/201/28/201/29/201/30/20...5/19/205/20/205/21/205/22/205/23/205/24/205/25/205/26/205/27/205/28/20
23Belgium0.00.00.00.00.00.00.00.00.0...55791.055983.056235.056511.056810.057092.057342.057455.057592.057849.0
116France0.00.02.03.03.03.04.05.05.0...178428.0179069.0179306.0179645.0179964.0179859.0180166.0179887.0180044.0183309.0
120Germany0.00.00.00.00.01.04.04.04.0...177778.0178473.0179021.0179710.0179986.0180328.0180600.0181200.0181524.0182196.0
133Iran0.00.00.00.00.00.00.00.00.0...124603.0126949.0129341.0131652.0133521.0135701.0137724.0139511.0141591.0143849.0
137Italy0.00.00.00.00.00.00.00.00.0...226699.0227364.0228006.0228658.0229327.0229858.0230158.0230555.0231139.0231732.0
139Japan2.02.02.02.04.04.07.07.011.0...16367.016367.016424.016513.016536.016550.016581.016623.016651.016598.0
143Korea, South1.01.02.02.03.04.04.04.04.0...11110.011122.011142.011165.011190.011206.011225.011265.011344.011402.0
169Netherlands0.00.00.00.00.00.00.00.00.0...44249.044447.044700.044888.045064.045236.045445.045578.045768.045950.0
184Portugal0.00.00.00.00.00.00.00.00.0...29432.029660.029912.030200.030471.030623.030788.031007.031292.031596.0
201Spain0.00.00.00.00.00.00.00.00.0...232037.0232555.0233037.0234824.0235290.0235772.0235400.0236259.0236259.0237906.0
223United Kingdom0.00.00.00.00.00.00.00.00.0...248818.0248293.0250908.0254195.0257154.0259559.0261184.0265227.0267240.0269127.0
225US1.01.02.02.05.05.05.05.05.0...1528568.01551853.01577147.01600937.01622612.01643246.01662302.01680913.01699176.01721753.0
61Hong Kong0.02.02.05.08.08.08.010.010.0...1055.01055.01055.01065.01065.01065.01065.01065.01066.01066.0
1China548.0641.0918.01401.02067.02869.05501.06077.08131.0...83008.083008.083008.083016.083019.083030.083037.083038.083040.083040.0
\n", "

14 rows × 129 columns

\n", "
" ], "text/plain": [ " Country/Region 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 \\\n", "23 Belgium 0.0 0.0 0.0 0.0 0.0 0.0 \n", "116 France 0.0 0.0 2.0 3.0 3.0 3.0 \n", "120 Germany 0.0 0.0 0.0 0.0 0.0 1.0 \n", "133 Iran 0.0 0.0 0.0 0.0 0.0 0.0 \n", "137 Italy 0.0 0.0 0.0 0.0 0.0 0.0 \n", "139 Japan 2.0 2.0 2.0 2.0 4.0 4.0 \n", "143 Korea, South 1.0 1.0 2.0 2.0 3.0 4.0 \n", "169 Netherlands 0.0 0.0 0.0 0.0 0.0 0.0 \n", "184 Portugal 0.0 0.0 0.0 0.0 0.0 0.0 \n", "201 Spain 0.0 0.0 0.0 0.0 0.0 0.0 \n", "223 United Kingdom 0.0 0.0 0.0 0.0 0.0 0.0 \n", "225 US 1.0 1.0 2.0 2.0 5.0 5.0 \n", "61 Hong Kong 0.0 2.0 2.0 5.0 8.0 8.0 \n", "1 China 548.0 641.0 918.0 1401.0 2067.0 2869.0 \n", "\n", " 1/28/20 1/29/20 1/30/20 ... 5/19/20 5/20/20 5/21/20 \\\n", "23 0.0 0.0 0.0 ... 55791.0 55983.0 56235.0 \n", "116 4.0 5.0 5.0 ... 178428.0 179069.0 179306.0 \n", "120 4.0 4.0 4.0 ... 177778.0 178473.0 179021.0 \n", "133 0.0 0.0 0.0 ... 124603.0 126949.0 129341.0 \n", "137 0.0 0.0 0.0 ... 226699.0 227364.0 228006.0 \n", "139 7.0 7.0 11.0 ... 16367.0 16367.0 16424.0 \n", "143 4.0 4.0 4.0 ... 11110.0 11122.0 11142.0 \n", "169 0.0 0.0 0.0 ... 44249.0 44447.0 44700.0 \n", "184 0.0 0.0 0.0 ... 29432.0 29660.0 29912.0 \n", "201 0.0 0.0 0.0 ... 232037.0 232555.0 233037.0 \n", "223 0.0 0.0 0.0 ... 248818.0 248293.0 250908.0 \n", "225 5.0 5.0 5.0 ... 1528568.0 1551853.0 1577147.0 \n", "61 8.0 10.0 10.0 ... 1055.0 1055.0 1055.0 \n", "1 5501.0 6077.0 8131.0 ... 83008.0 83008.0 83008.0 \n", "\n", " 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 \\\n", "23 56511.0 56810.0 57092.0 57342.0 57455.0 57592.0 \n", "116 179645.0 179964.0 179859.0 180166.0 179887.0 180044.0 \n", "120 179710.0 179986.0 180328.0 180600.0 181200.0 181524.0 \n", "133 131652.0 133521.0 135701.0 137724.0 139511.0 141591.0 \n", "137 228658.0 229327.0 229858.0 230158.0 230555.0 231139.0 \n", "139 16513.0 16536.0 16550.0 16581.0 16623.0 16651.0 \n", "143 11165.0 11190.0 11206.0 11225.0 11265.0 11344.0 \n", "169 44888.0 45064.0 45236.0 45445.0 45578.0 45768.0 \n", "184 30200.0 30471.0 30623.0 30788.0 31007.0 31292.0 \n", "201 234824.0 235290.0 235772.0 235400.0 236259.0 236259.0 \n", "223 254195.0 257154.0 259559.0 261184.0 265227.0 267240.0 \n", "225 1600937.0 1622612.0 1643246.0 1662302.0 1680913.0 1699176.0 \n", "61 1065.0 1065.0 1065.0 1065.0 1065.0 1066.0 \n", "1 83016.0 83019.0 83030.0 83037.0 83038.0 83040.0 \n", "\n", " 5/28/20 \n", "23 57849.0 \n", "116 183309.0 \n", "120 182196.0 \n", "133 143849.0 \n", "137 231732.0 \n", "139 16598.0 \n", "143 11402.0 \n", "169 45950.0 \n", "184 31596.0 \n", "201 237906.0 \n", "223 269127.0 \n", "225 1721753.0 \n", "61 1066.0 \n", "1 83040.0 \n", "\n", "[14 rows x 129 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataCountries = dataCountries.drop(['Province/State','Lat', 'Long'], axis=1)\n", "dataCountries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Traitement des données \n", "\n", "Ici le traitement des données consiste uniquement à représenter des données temporelles dans un graphique. \n", "Dans un premier temps les données sont transposées d'un format horizontal en format vertical pour être correctement représenté dans un graphique (voir méthode [transpose](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html)). Enfin les columns sont renommées afin de correspondre à leur pays.\n", "\n", "Tout ce qui suit permet de dessiner le graphique suivant. Un détail est important à noter, l'ensemble des dates a été remplacé par le nombre de jour depuis la première donnée datant du 22 janvier 2020 (0 à 128). Pour éviter qu'ils ne se chevauchent 1 jour sur 4 est affiché sur l'absice. \n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "dataTransposed = dataCountries.transpose()\n", "\n", "header_row = 0\n", "dataTransposed.columns = dataTransposed.iloc[header_row]\n", "dataTransposed = dataTransposed.drop(['Country/Region'], axis=0)\n", "\n", "days = []\n", "for day in range(dataTransposed.index.size):\n", " days.append(day)\n", "\n", "############################################## Figure ######################################################################\n", "fig, ax = plt.subplots()\n", "\n", "\n", "for col in dataTransposed.columns :\n", " dataTransposed.plot(kind='line',x=dataTransposed.index,y=col,ax=ax)\n", "\n", "\n", "ax.set_xlabel(\"Days\")\n", "ax.set_ylabel(\"Coronavirus Cases\")\n", "fig.set_size_inches(20, 20)\n", "\n", "# To specify the number of ticks on both or any single axes\n", "plt.locator_params(axis='y', nbins=20)\n", "\n", "plt.text(0.80, 0.95, ('US May 28 '+'\\n'+str(dataTransposed.iloc[dataTransposed.index.size-1]['US'])+' cases'),\n", " horizontalalignment='center',\n", " verticalalignment='center',\n", " transform=ax.transAxes,\n", " fontsize=20)\n", "\n", "\n", "plt.text(0.85, 0.20, ('China May 28 '+'\\n'+str(dataTransposed.iloc[dataTransposed.index.size-1]['China'])+' cases'),\n", " horizontalalignment='center',\n", " verticalalignment='center',\n", " transform=ax.transAxes,\n", " fontsize=20)\n", "\n", "\n", "# only print one over four of the days since the beginning\n", "ax.set_xticks(days[::4])\n", "ax.set_xticklabels(days[::4], rotation=45)\n", "\n", "\n", "\n", "ax.set_title('Cumulative coronavirus cases')\n", "ax.legend(loc = 'upper left',fontsize='x-large')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Elément complémentaire\n", "\n", "Si nous ajoutons plus de données relatives aux pays étudiés nous pourrions réaliser un grand nombre de comparaisons entre pays.\n", "\n", "Un exemple : On peut comparer l'efficacité du confinement dans les pays qui ont appliqués un confinement strict, prenons la France et l'Italie. On peut facilement obtenir les dates de début de confinement ainsi que ne nombre de décès liés aux cas de coronavirus, ainsi qu'une multitude de facteurs démographiques. Avec un tel corpus de données, nous pourrions mener des tests statistiques suivants le modèle linéaire général. Probablement une régression linéaire multiple dans notre cas et ainsi voir si le nombre de jour passé depuis le début du confinement explique bien la stagnation du nombre de cas puis sa diminution.\n", "\n", "Étant donné la complexité d'une telle étude et le risque d'erreur d'interprétation et de calcul, ce serait prétentieux que de prétendre pouvoir conduire un tel test. Cependant, il est probable que de tels tests aient été menés par l'OMS ou les agences nationales de santé.\n", "\n", "# Conclusion \n", "\n", "Ce notebook présente la méthode employée afin de réaliser la figure ci-dessus **Cumulative coronavirus cases**. Les principales difficultés consistent à correctement regrouper les données dans un même jeu de données puis de les afficher de façon lisible dans un graphique. Ce notebook ne présente aucun algorithme complexe, mais nécessite une bonne connaissance des librairies python tel que *pyplot* et *pandas*.\n", "\n", "# Référence \n", "\n", "* [Matplotlib](https://matplotlib.org/3.1.0/index.html)\n", "* [Pandas](https://pandas.pydata.org/docs/)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }