Commit 33931633 authored by Val's avatar Val

Update exo

parent 89fa46f7
This source diff could not be displayed because it is too large. You can view the blob instead.
Name,WHO Region,Cases - cumulative total,Cases - cumulative total per 100000 population,Cases - newly reported in last 7 days,Cases - newly reported in last 7 days per 100000 population,Cases - newly reported in last 24 hours,Deaths - cumulative total,Deaths - cumulative total per 100000 population,Deaths - newly reported in last 7 days,Deaths - newly reported in last 7 days per 100000 population,Deaths - newly reported in last 24 hours
Global,,766895075,9838.874900615549,265656,3.4082317592115516,0,6935889,88.98393850756634,1656,0.02124564020106576,0,
United States of America,Americas,103436829,31249.547,0,0,0,1127152,340.527,0,0,0
China,Western Pacific,99261812,6746.598,3859,0.262,0,121144,8.234,57,0.004,0
India,South-East Asia,44987339,3259.942,4187,0.303,0,531843,38.539,49,0.004,0
France,Europe,39010097,59979.215,15577,23.95,0,163437,251.289,76,0.117,0
Germany,Europe,38423300,46200.336,2641,3.176,0,174032,209.257,10,0.012,0
Brazil,Americas,37553337,17667.219,41416,19.484,0,702421,330.459,305,0.143,0
Japan,Western Pacific,33803572,26727.165,0,0,0,74694,59.058,0,0,0
Republic of Korea,Western Pacific,31548083,61534.2,82976,161.844,0,34687,67.657,53,0.103,0
Italy,Europe,25842595,43329.896,5030,8.434,0,190242,318.976,50,0.084,0
The United Kingdom,Europe,24611066,36253.52,1529,2.252,0,225852,332.693,0,0,0
Russian Federation,Europe,22917873,15704.223,17118,11.73,0,398919,273.355,183,0.125,0
Türkiye,Europe,17004677,20162.278,0,0,0,101419,120.252,0,0,0
Spain,Europe,13868227,29299.516,22424,47.375,0,121213,256.088,296,0.625,0
Viet Nam,Western Pacific,11602738,11919.979,8119,8.341,0,43203,44.384,2,0.002,0
Australia,Western Pacific,11339196,44467.638,0,0,0,20721,81.259,0,0,0
Argentina,Americas,10044957,22225.434,0,0,0,130472,288.682,0,0,0
Netherlands,Europe,8610372,49463.335,0,0,0,22992,132.08,0,0,0
Mexico,Americas,7611736,5903.648,7865,6.1,0,334079,259.111,66,0.051,0
Iran (Islamic Republic of),Eastern Mediterranean,7611138,9061.639,243,0.289,0,146230,174.098,13,0.015,0
Indonesia,South-East Asia,6803504,2487.355,3744,1.369,0,161701,59.118,55,0.02,0
Poland,Europe,6516424,17167.396,534,1.407,0,119600,315.084,6,0.016,0
Colombia,Americas,6366777,12512.609,783,1.539,0,142741,280.528,14,0.028,0
Austria,Europe,6074437,68243.943,1841,20.683,0,22488,252.644,1,0.011,0
Greece,Europe,6059649,56534.144,0,0,0,36917,344.421,0,0,0
Portugal,Europe,5585859,54253.189,813,7.896,0,26701,259.336,16,0.155,0
Ukraine,Europe,5549708,12689.758,0,0,0,112315,256.815,0,0,0
Chile,Americas,5286815,27656.201,717,3.751,0,61491,321.67,33,0.173,0
Malaysia,Western Pacific,5094448,15740.123,6439,19.894,0,37070,114.534,24,0.074,0
Israel,Europe,4825362,55748.82,399,4.61,0,12524,144.693,3,0.035,0
Belgium,Europe,4798041,41640.842,3,0.026,0,34310,297.767,0,0,0
Thailand,South-East Asia,4738988,6789.383,2632,3.771,0,34053,48.787,64,0.092,0
Canada,Americas,4669364,12371.748,0,0,0,52301,138.574,0,0,0
Czechia,Europe,4641759,43405.512,249,2.328,0,42796,400.189,2,0.019,0
Peru,Americas,4505860,13665.777,640,1.941,0,220561,668.937,45,0.136,0
Switzerland,Europe,4404916,50896.712,0,0,0,14012,161.902,0,0,0
Philippines,Western Pacific,4127628,3766.734,9615,8.774,0,66466,60.655,13,0.012,0
South Africa,Africa,4072533,6866.672,0,0,0,102595,172.985,0,0,0
Denmark,Europe,3413431,58622.187,342,5.874,0,8673,148.95,23,0.395,0
Romania,Europe,3402356,17602.486,2379,12.308,0,68172,352.696,29,0.15,0
Sweden,Europe,2709041,26231.108,0,0,0,24274,235.04,0,0,0
Serbia,Europe,2540323,36674.335,1172,16.92,0,18047,260.542,5,0.072,0
Iraq,Eastern Mediterranean,2465545,6129.767,0,0,0,25375,63.087,0,0,0
Singapore,Western Pacific,2438690,41684.572,0,0,0,1722,29.434,0,0,0
New Zealand,Western Pacific,2295559,47603.652,12698,263.322,0,2893,59.993,43,0.892,0
Hungary,Europe,2202491,22544.502,0,0,0,48781,499.318,0,0,0
Bangladesh,South-East Asia,2038708,1237.911,169,0.103,0,29446,17.88,0,0,0
Slovakia,Europe,1866814,34204.057,69,1.264,0,21167,387.825,0,0,0
Georgia,Europe,1842046,46176.114,0,0,0,17070,427.908,0,0,0
Jordan,Eastern Mediterranean,1746997,17122.161,0,0,0,14122,138.408,0,0,0
Ireland,Europe,1711691,34479.035,118,2.377,0,8935,179.98,30,0.604,0
Pakistan,Eastern Mediterranean,1580631,715.566,0,0,0,30656,13.878,0,0,0
Kazakhstan,Europe,1502857,8003.837,0,0,0,19072,101.573,0,0,0
Norway,Europe,1484448,27655.815,156,2.906,0,5495,102.374,0,0,0
Finland,Europe,1478305,26755.238,172,3.113,0,9589,173.547,0,0,0
Slovenia,Europe,1344104,64131.352,149,7.109,0,9364,446.785,0,0,0
Lithuania,Europe,1320046,47244.219,327,11.703,0,9679,346.41,7,0.251,0
Bulgaria,Europe,1306862,18799.761,604,8.689,0,38372,551.997,12,0.173,0
Morocco,Eastern Mediterranean,1274180,3452.074,0,0,0,16297,44.153,0,0,0
Croatia,Europe,1273671,31385.392,163,4.017,0,18246,449.612,16,0.394,0
Guatemala,Americas,1251086,6983.234,818,4.566,0,20199,112.746,0,0,0
Lebanon,Eastern Mediterranean,1237556,18131.506,0,0,0,10914,159.902,0,0,0
Costa Rica,Americas,1230552,24156.331,0,0,0,9366,183.859,0,0,0
Bolivia (Plurinational State of),Americas,1198404,10266.443,421,3.607,0,22383,191.75,1,0.009,0
Tunisia,Eastern Mediterranean,1153161,9757.155,0,0,0,29412,248.862,0,0,0
Puerto Rico,Americas,1122076,39221.729,0,0,0,5930,207.281,15,0.524,0
Cuba,Americas,1113830,9833.74,236,2.084,0,8530,75.309,0,0,0
United Arab Emirates,Eastern Mediterranean,1066641,10784.607,609,6.157,0,2349,23.75,0,0,0
Ecuador,Americas,1061766,6018.04,0,0,0,36019,204.154,0,0,0
Panama,Americas,1038642,24071.798,664,15.389,0,8623,199.849,1,0.023,0
Uruguay,Americas,1037893,29878.344,0,0,0,7625,219.505,0,0,0
Mongolia,Western Pacific,1008655,30767.717,0,0,0,2136,65.156,0,0,0
Nepal,South-East Asia,1003307,3443.435,47,0.161,0,12031,41.291,0,0,0
Belarus,Europe,994037,10519.666,0,0,0,7118,75.328,0,0,0
Latvia,Europe,977891,51260.88,27,1.415,0,6368,333.809,7,0.367,0
Saudi Arabia,Eastern Mediterranean,841469,2417.051,0,0,0,9646,27.707,0,0,0
Azerbaijan,Europe,831735,8203.182,57,0.562,0,10272,101.31,4,0.039,0
Paraguay,Americas,735759,10315.529,0,0,0,19880,278.723,0,0,0
"occupied Palestinian territory, including east Jerusalem",Eastern Mediterranean,703228,13784.962,0,0,0,5708,111.891,0,0,0
Bahrain,Eastern Mediterranean,696614,40939.365,0,0,0,1536,90.269,0,0,0
Sri Lanka,South-East Asia,672380,3140.019,63,0.294,0,16868,78.774,12,0.056,0
Kuwait,Eastern Mediterranean,665909,15592.973,24,0.562,0,2570,60.179,0,0,0
Dominican Republic,Americas,661176,6094.962,73,0.673,0,4384,40.413,0,0,0
Cyprus,Europe,660854,74420.076,0,0,0,1364,153.603,0,0,0
Myanmar,South-East Asia,638116,1172.796,494,0.908,0,19494,35.828,0,0,0
Republic of Moldova,Europe,620519,15382.367,0,0,0,12118,300.399,0,0,0
Estonia,Europe,618608,46547.718,0,0,0,3001,225.813,0,0,0
Venezuela (Bolivarian Republic of),Americas,552695,1943.649,0,0,0,5856,20.594,0,0,0
Egypt,Eastern Mediterranean,516023,504.252,0,0,0,24830,24.264,0,0,0
Qatar,Eastern Mediterranean,511932,17768.92,788,27.351,0,690,23.95,0,0,0
Libya,Eastern Mediterranean,507255,7382.236,0,0,0,6437,93.68,0,0,0
Ethiopia,Africa,500872,435.679,1,0.001,0,7574,6.588,0,0,0
Réunion,Africa,494595,55242.753,0,0,0,921,102.869,0,0,0
Honduras,Americas,472619,4771.709,25,0.252,0,11116,112.231,2,0.02,0
Armenia,Europe,449169,15158.067,0,0,0,8750,295.285,0,0,0
Bosnia and Herzegovina,Europe,402940,12281.704,22,0.671,0,16346,498.23,0,0,0
Oman,Eastern Mediterranean,399449,7822.171,0,0,0,4628,90.627,0,0,0
North Macedonia,Europe,348276,16716.874,0,0,0,9677,464.486,0,0,0
Zambia,Africa,343995,1871.17,0,0,0,4058,22.074,0,0,0
Kenya,Africa,343074,638.024,0,0,0,5688,10.578,0,0,0
Albania,Europe,334090,11609.215,0,0,0,3604,125.235,0,0,0
Botswana,Africa,329862,14026.969,0,0,0,2797,118.939,0,0,0
Luxembourg,Europe,319959,51102.845,0,0,0,1232,196.771,0,0,0
Mauritius,Africa,304233,23922.052,0,0,0,1050,82.562,0,0,0
Brunei Darussalam,Western Pacific,303719,69424.818,0,0,0,161,36.802,0,0,0
Montenegro,Europe,291830,46465.158,0,0,0,2827,450.115,0,0,0
Kosovo[1],Europe,273889,15252.781,7,0.39,0,3206,178.541,0,0,0
Algeria,Africa,271820,619.871,3,0.007,0,6881,15.692,0,0,0
Nigeria,Africa,266675,129.366,0,0,0,3155,1.531,0,0,0
Zimbabwe,Africa,264848,1781.937,0,0,0,5690,38.283,0,0,0
Uzbekistan,Europe,253662,757.897,95,0.284,0,1637,4.891,0,0,0
Mozambique,Africa,233417,746.805,0,0,0,2243,7.176,0,0,0
Martinique,Americas,229975,61283.36,75,19.986,0,1102,293.659,2,0.533,0
Afghanistan,Eastern Mediterranean,220059,565.292,677,1.739,0,7913,20.327,0,0,0
Lao People's Democratic Republic,Western Pacific,218196,2999.027,14,0.192,0,671,9.223,0,0,0
Iceland,Europe,209191,57448.906,0,0,0,260,71.402,0,0,0
Kyrgyzstan,Europe,206890,3171.121,0,0,0,2991,45.845,0,0,0
Guadeloupe,Americas,202836,50693.285,183,45.736,0,1017,254.171,0,0,0
El Salvador,Americas,201785,3110.987,0,0,0,4230,65.215,0,0,0
Trinidad and Tobago,Americas,191496,13683.29,0,0,0,4390,313.686,0,0,0
Maldives,South-East Asia,186625,34525.404,36,6.66,0,315,58.275,1,0.185,0
Ghana,Africa,171653,552.42,0,0,0,1462,4.705,0,0,0
Namibia,Africa,171310,6742.086,0,0,0,4091,161.006,0,0,0
Uganda,Africa,170775,373.352,0,0,0,3632,7.94,0,0,0
Jamaica,Americas,154938,5232.329,64,2.161,0,3545,119.716,9,0.304,0
Cambodia,Western Pacific,138740,829.836,4,0.024,0,3056,18.279,0,0,0
Rwanda,Africa,133194,1028.349,0,0,0,1468,11.334,0,0,0
Cameroon,Africa,125036,471.019,0,0,0,1972,7.429,0,0,0
Malta,Europe,118631,23054.664,0,0,0,835,162.273,0,0,0
Barbados,Americas,107794,37509.874,0,0,0,593,206.351,0,0,0
Angola,Africa,105384,320.645,0,0,0,1934,5.884,0,0,0
French Guiana,Americas,98041,32824.542,0,0,0,413,138.274,0,0,0
Democratic Republic of the Congo,Africa,96652,107.917,0,0,0,1467,1.638,0,0,0
Senegal,Africa,88997,531.518,0,0,0,1971,11.771,0,0,0
Malawi,Africa,88638,463.347,0,0,0,2686,14.041,0,0,0
Côte d’Ivoire,Africa,88330,334.859,0,0,0,834,3.162,0,0,0
Suriname,Americas,82513,14065.547,18,3.068,0,1405,239.503,1,0.17,0
New Caledonia,Western Pacific,80058,28041.527,0,0,0,314,109.983,0,0,0
French Polynesia,Western Pacific,78569,27969.656,0,0,0,649,231.036,0,0,0
Eswatini,Africa,74670,6436.159,0,0,0,1425,122.827,0,0,0
Guyana,Americas,73207,9307.331,0,0,0,1298,165.024,0,0,0
Belize,Americas,70782,17801.06,0,0,0,688,173.026,0,0,0
Fiji,Western Pacific,68921,7688.258,0,0,0,883,98.5,0,0,0
Madagascar,Africa,68266,246.528,0,0,0,1424,5.142,0,0,0
Jersey,Europe,66391,61589.484,0,0,0,161,149.356,0,0,0
Sudan,Eastern Mediterranean,63993,145.939,0,0,0,5046,11.508,0,0,0
Cabo Verde,Africa,63820,11478.686,16,2.878,0,414,74.462,0,0,0
Mauritania,Africa,63669,1369.327,0,0,0,997,21.442,0,0,0
Bhutan,South-East Asia,62670,8122,2,0.259,0,21,2.722,0,0,0
Syrian Arab Republic,Eastern Mediterranean,57423,328.119,0,0,0,3163,18.074,0,0,0
Burundi,Africa,53751,452.039,0,0,0,15,0.126,0,0,0
Guam,Western Pacific,51427,30470.745,82,48.585,0,413,244.704,0,0,0
Seychelles,Africa,50937,51793.141,0,0,0,172,174.891,0,0,0
Gabon,Africa,48992,2201.162,0,0,0,307,13.793,0,0,0
Andorra,Europe,48015,62143.273,0,0,0,159,205.785,0,0,0
Papua New Guinea,Western Pacific,46864,523.794,0,0,0,670,7.489,0,0,0
Curaçao,Americas,45812,27918.315,0,0,0,302,184.042,0,0,0
Aruba,Americas,44180,41380.215,0,0,0,288,269.749,0,0,0
United Republic of Tanzania,Africa,43078,72.116,0,0,0,846,1.416,0,0,0
Mayotte,Africa,42027,15404.945,0,0,0,187,68.545,0,0,0
Togo,Africa,39491,477.018,0,0,0,290,3.503,0,0,0
Guinea,Africa,38563,293.639,0,0,0,468,3.564,0,0,0
Bahamas,Americas,38084,9684.572,0,0,0,844,214.625,0,0,0
Isle of Man,Europe,38008,44698.466,0,0,0,116,136.419,0,0,0
Guernsey,Europe,35326,54796.178,0,0,0,67,103.928,0,0,0
Faroe Islands,Europe,34658,70926.021,0,0,0,28,57.301,0,0,0
Lesotho,Africa,34490,1609.99,0,0,0,706,32.956,0,0,0
Haiti,Americas,34237,300.258,9,0.079,0,860,7.542,0,0,0
Mali,Africa,33148,163.687,1,0.005,0,743,3.669,0,0,0
Cayman Islands,Americas,31472,47888.01,0,0,0,37,56.299,0,0,0
Saint Lucia,Americas,30052,16365.785,0,0,0,409,222.734,0,0,0
Benin,Africa,28014,231.078,0,0,0,163,1.345,0,0,0
Somalia,Eastern Mediterranean,27334,171.985,0,0,0,1361,8.563,0,0,0
Micronesia (Federated States of),Western Pacific,26453,22998.009,0,0,0,65,56.51,0,0,0
Congo,Africa,25195,456.589,0,0,0,389,7.05,0,0,0
United States Virgin Islands,Americas,25046,23984.678,52,49.797,0,131,125.449,0,0,0
San Marino,Europe,24263,71492.133,9,26.519,0,125,368.319,0,0,0
Timor-Leste,South-East Asia,23444,1778.155,1,0.076,0,138,10.467,0,0,0
Burkina Faso,Africa,22056,105.515,0,0,0,396,1.894,0,0,0
Solomon Islands,Western Pacific,21611,3146.237,0,0,0,153,22.275,0,0,0
Liechtenstein,Europe,21468,55405.58,0,0,0,87,224.534,0,0,0
Gibraltar,Europe,20550,60995.518,0,0,0,113,335.401,0,0,0
Grenada,Americas,19693,17501.311,0,0,0,238,211.512,0,0,0
Bermuda,Americas,18860,30285.999,0,0,0,165,264.962,0,0,0
South Sudan,Africa,18368,164.092,0,0,0,138,1.233,0,0,0
Tajikistan,Europe,17786,186.482,0,0,0,125,1.311,0,0,0
Equatorial Guinea,Africa,17130,1220.968,0,0,0,183,13.044,0,0,0
Tonga,Western Pacific,16817,15910.876,0,0,0,12,11.353,0,0,0
Monaco,Europe,16789,42781.062,8,20.385,0,67,170.727,0,0,0
Samoa,Western Pacific,16763,8448.497,0,0,0,31,15.624,0,0,0
Marshall Islands,Western Pacific,16081,27166.605,0,0,0,17,28.719,0,0,0
Dominica,Americas,15760,21891.625,0,0,0,74,102.791,0,0,0
Nicaragua,Americas,15720,237.299,7,0.106,0,245,3.698,0,0,0
Djibouti,Eastern Mediterranean,15690,1588.057,0,0,0,189,19.13,0,0,0
Central African Republic,Africa,15367,318.173,0,0,0,113,2.34,0,0,0
Northern Mariana Islands (Commonwealth of the),Western Pacific,13896,24143.023,0,0,0,41,71.234,0,0,0
Gambia,Africa,12626,522.455,0,0,0,372,15.393,0,0,0
Saint Martin,Americas,12303,31824.413,3,7.76,0,46,118.989,0,0,0
Vanuatu,Western Pacific,12016,3912.159,0,0,0,14,4.558,0,0,0
Greenland,Europe,11971,21086.099,0,0,0,21,36.99,0,0,0
Yemen,Eastern Mediterranean,11945,40.049,0,0,0,2159,7.239,0,0,0
Sint Maarten,Americas,11030,25721.748,0,0,0,92,214.542,0,0,0
Eritrea,Africa,10189,287.304,0,0,0,103,2.904,0,0,0
Bonaire,Americas,9855,47119.292,0,0,0,33,157.781,0,0,0
Saint Vincent and the Grenadines,Americas,9631,8681.269,0,0,0,124,111.772,0,0,0
Guinea-Bissau,Africa,9614,488.516,0,0,0,177,8.994,0,0,0
Niger,Africa,9513,39.299,0,0,0,315,1.301,0,0,0
Comoros,Africa,9109,1047.492,0,0,0,160,18.399,0,0,0
Antigua and Barbuda,Americas,9106,9298.573,0,0,0,146,149.088,0,0,0
American Samoa,Western Pacific,8331,15093.212,0,0,0,34,61.598,0,0,0
Liberia,Africa,8090,159.955,0,0,0,294,5.813,0,0,0
Sierra Leone,Africa,7762,97.305,0,0,0,125,1.567,0,0,0
Chad,Africa,7698,46.865,0,0,0,194,1.181,0,0,0
British Virgin Islands,Americas,7305,24159.143,0,0,0,64,211.661,0,0,0
Cook Islands,Western Pacific,7106,40457.754,0,0,0,2,11.387,0,0,0
Saint Kitts and Nevis,Americas,6600,12407.881,0,0,0,46,86.479,0,0,0
Turks and Caicos Islands,Americas,6588,17015.342,0,0,0,38,98.146,0,0,0
Sao Tome and Principe,Africa,6575,3000.105,0,0,0,80,36.503,0,0,0
Palau,Western Pacific,6000,33163.829,0,0,0,9,49.746,0,0,0
Saint Barthélemy,Americas,5494,55579.16,8,80.931,0,5,50.582,0,0,0
Nauru,Western Pacific,5393,49778.475,0,0,0,1,9.23,0,0,0
Kiribati,Western Pacific,5027,4208.491,2,1.674,0,24,20.092,0,0,0
Anguilla,Americas,3904,26023.197,0,0,0,12,79.989,0,0,0
Wallis and Futuna,Western Pacific,3508,31193.313,0,0,0,7,62.244,0,0,0
Saint Pierre and Miquelon,Americas,3426,59119.931,0,0,0,2,34.513,0,0,0
Tuvalu,Western Pacific,2779,23566.825,0,0,0,0,0,0,0,0
"Saint Helena, Ascension and Tristan da Cunha",Africa,2166,35677.813,0,0,0,0,0,0,0,0
Falkland Islands (Malvinas),Americas,1923,55211.025,0,0,0,0,0,0,0,0
Montserrat,Americas,1403,28065.613,0,0,0,8,160.032,0,0,0
Sint Eustatius,Americas,1217,38770.309,0,0,0,6,191.144,0,0,0
Saba,Americas,813,42058.976,0,0,0,2,103.466,0,0,0
Niue,Western Pacific,802,49567.367,0,0,0,0,0,0,0,0
Other,Other,764,,0,,0,13,,0,,0
Holy See,Europe,26,3213.844,0,0,0,0,0,0,0,0
Tokelau,Western Pacific,5,370.37,0,0,0,0,0,0,0,0
Pitcairn Islands,Western Pacific,4,8000,0,0,0,0,0,0,0,0
Democratic People's Republic of Korea,South-East Asia,0,0,0,0,0,0,0,0,0,0
Turkmenistan,Europe,0,0,0,0,0,0,0,0,0,0
--- ---
title: "MOOC_COVID_Analysis" title: "MOOC_COVID_Analysis"
author: "VB" author: "VB (feb2301522924f68234e7a552680f397)"
date: "2023-05-24" date: "2023-05-24"
output: html_document output:
html_document: default
pdf_document: default
--- ---
```{r setup, include=FALSE} ```{r setup, include=FALSE}
...@@ -21,21 +23,22 @@ Le nom entre parenthèses est le nom du « pays » tel qu'il apparaît dans le f ...@@ -21,21 +23,22 @@ Le nom entre parenthèses est le nom du « pays » tel qu'il apparaît dans le f
Ensuite vous ferez un graphe avec la date en abscisse et le nombre cumulé de cas à cette date en ordonnée. Nous vous proposons de faire deux versions de ce graphe, une avec une échelle linéaire et une avec une échelle logarithmique. Ensuite vous ferez un graphe avec la date en abscisse et le nombre cumulé de cas à cette date en ordonnée. Nous vous proposons de faire deux versions de ce graphe, une avec une échelle linéaire et une avec une échelle logarithmique.
**The rest of this RMarkdown file will be in english. Documentation for some of the functions is available as commentaries in the code.** **The rest of this RMarkdown file will be in english.**
## Installing and loading required packages ## Installing and loading required packages
In this analysis, I will use 3 packages : **tidyverse** to format the data and **ggplot2/ggrepel** for the graphical representation. The next R lines will detect if the packages are installed, and if not, it should install them In this analysis, I will use 3 packages : **tidyverse/stringr** to format the data and **ggplot2/ggrepel** for the graphical representation. The next R lines will detect if the packages are installed, and if not, it should install them.
```{r} ```{r}
list.of.packages <- c("ggplot2", "tidyverse", "ggrepel") list.of.packages <- c("ggplot2", "tidyverse", "ggrepel","stringr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages) if(length(new.packages)) install.packages(new.packages)
``` ```
## Downloading and loading the data in R ## Downloading and loading the data in R
The next chunk of code will load the data in R. If no local copy of the csv file is present, it will be downloaded The next chunk of code will load the data in R. If no local copy of the csv file is present, it will be downloaded.
```{r} ```{r}
...@@ -48,24 +51,24 @@ if (!file.exists(data_file)) { ...@@ -48,24 +51,24 @@ if (!file.exists(data_file)) {
data=read.csv("time_series_covid19_confirmed_global.csv",sep=",") #sep= allows the loading of "," separated csv data=read.csv("time_series_covid19_confirmed_global.csv",sep=",") #sep= allows the loading of "," separated csv
``` ```
I then check that the loading is correct, since the csv is "," separated I then check that the loading is correct, since the csv is "," separated.
```{r} ```{r}
head(colnames(data)) # Allows us to see that the columns were well separated head(colnames(data)) # Allows us to see that the columns were well separated
head(data$Country.Region) head(data$Country.Region)
``` ```
It is correctly loaded, but the data frame will need some manipulations before I can use it with ggplot, the package I will use for the graphical representation It is correctly loaded, but the data frame will need some manipulations before I can use it with ggplot, the package I will use for the graphical representation.
## Building the data frame to show the number of cases over time ## Building the data frame to show the number of cases over time
Since I want to exclude all "colonial territories" except China's regions, I will **create a data frame with rows containing an empty "Province.State"**. The only purpose of this data frame is to simplify the selection of the countries that I want for the analysis Since I want to exclude all "colonial territories" except China's regions, I will exclude all of the non-metropolitan territories by **creating a data frame from only the rows containing an empty "Province.State"**. The only purpose of this data frame is to simplify the selection of the countries that I want for the analysis.
```{r} ```{r}
data_noProvince<-data[(data$Province.State==""),] data_noProvince<-data[(data$Province.State==""),]
``` ```
I will **check that the selection is correct** by comparing the number of rows in the two data frame and verifying that France is only shown once I will **check that the selection is correct** by comparing the number of rows in the two data frame and verifying that France is only shown once.
```{r} ```{r}
nrow(data_noProvince) nrow(data_noProvince)
...@@ -90,35 +93,42 @@ hong_kong$Country.Region<-"Hong Kong" #I isolated the numbers associated with Ho ...@@ -90,35 +93,42 @@ hong_kong$Country.Region<-"Hong Kong" #I isolated the numbers associated with Ho
**For China, I will add up the infected-per-day numbers**, which will give me a single row for the final data frame. **For China, I will add up the infected-per-day numbers**, which will give me a single row for the final data frame.
```{r} ```{r}
china<-cbind(china_data[1,c(1,2)],t(data.frame(list(colSums(china_data[,-c(1,2)]))))) #cbind() allows the fusion of the columns of multiple data frame with a same number of rows. t() is used to transpose the axes of the data frame, for formatting purpose for tidyverse. colSums provides the sums of all the columns
china<-cbind(china_data[1,c(1,2)],t(data.frame(list(colSums(china_data[,-c(1,2)])))))
#cbind() allows the fusion of the columns of multiple data frame with a same number of rows.
#t() is used to transpose the axes of the data frame, for formatting purpose for tidyverse. colSums provides the sums of all the columns
``` ```
Finally, I will **assemble the 3 data frames** (China, Hong-Kong, and the other countries). I will put the country names as row_names, for formatting purpose. Finally, I will **assemble the 3 data frames** (China, Hong-Kong, and the other countries). I will put the country names as row_names, for formatting purpose.
```{r} ```{r}
dt_countries<-rbind(china, hong_kong,dt_countries)#rbind is similar to cbind, but to fuse rows dt_countries<-rbind(china, hong_kong,dt_countries) #rbind is similar to cbind, but to fuse rows
colnames(dt_countries)<-colnames(data) colnames(dt_countries)<-colnames(data) # to conserve original column names
rownames(dt_countries)<-dt_countries$Country.Region rownames(dt_countries)<-dt_countries$Country.Region
``` ```
To allow the graphical representation, **I will remove all the unnecessary variables** (State, Latitude and Longitude). **I transpose the data frame so that countries are now the variables and the date are discriminating values**. Again it is mostly for formatting. To allow the graphical representation, **I will remove all the unnecessary variables** (State, Latitude and Longitude). **I transpose the data frame so that countries are now the variables and the date are discriminating values**. Again, it is mostly for formatting.
```{r} ```{r}
dt_countries_onlydata<-as.data.frame(t(dt_countries[,c(5:ncol(dt_countries))])) dt_countries_onlydata<-as.data.frame(t(dt_countries[,c(5:ncol(dt_countries))]))# remove unnecessary variables
dt_countries_onlydata$date<-rownames(dt_countries_onlydata) dt_countries_onlydata$date<-rownames(dt_countries_onlydata) # adding dates as variable
``` ```
Using tidyverse, I will prepare the data for processing by ggplot. Using tidyverse, I will **prepare the data for processing by ggplot**.
To be more precise, it will associate a key (here, countries) to each value (here, cases). Values and keys are ordered with the "date" variable. When ggplot will plot the data, it will plot the values to lines specified by their keys i.e. plot the number of cases from France to a line "France". If it is not done this way, ggplot will not understand that I want to plot multiple lines and the output is unreadable.
```{r} ```{r}
library(tidyverse) library(tidyverse,quietly=TRUE)
df<-dt_countries_onlydata %>% df<-dt_countries_onlydata %>%
select(colnames(dt_countries_onlydata)) %>% select(colnames(dt_countries_onlydata)) %>%
gather(key="Country",value="Infected",-date) gather(key="Country",value="Cases",-date)
``` ```
We can compare the two data frames to see the different layouts We can compare the two data frames to see the different layouts.
```{r} ```{r}
...@@ -129,6 +139,7 @@ head(df) ...@@ -129,6 +139,7 @@ head(df)
I will finally convert the obscure date format to one that R can recognize I will finally convert the obscure date format to one that R can recognize
```{r} ```{r}
library(stringr,quietly=TRUE)
df$date<-as.Date(str_sub(df$date,2,-1),format="%m.%d.%y")# str_sub is used to remove the "X" before the date df$date<-as.Date(str_sub(df$date,2,-1),format="%m.%d.%y")# str_sub is used to remove the "X" before the date
``` ```
...@@ -137,10 +148,10 @@ Now, the data are ready to be drawn by ggplot ...@@ -137,10 +148,10 @@ Now, the data are ready to be drawn by ggplot
## Graphical representation ## Graphical representation
```{r} ```{r}
library(ggplot2) library(ggplot2,quietly=TRUE)
library(ggrepel)#package used to ad the label shown below library(ggrepel,quietly=TRUE)#package used to ad the label shown below
ggplot(df, aes(x = date, y = Infected)) + ggplot(df, aes(x = date, y = Cases)) +
geom_line(aes(color = Country, group = Country)) geom_line(aes(color = Country, group = Country))
``` ```
...@@ -151,7 +162,7 @@ I will add a label to identify the most infected country ...@@ -151,7 +162,7 @@ I will add a label to identify the most infected country
label = if_else(df$date == max(df$date), as.character(df$Country), NA_character_) label = if_else(df$date == max(df$date), as.character(df$Country), NA_character_)
ggplot(df, aes(x = date, y = Infected)) + ggplot(df, aes(x = date, y = Cases)) +
geom_line(aes(color = Country, group = Country))+ geom_line(aes(color = Country, group = Country))+
geom_label_repel(aes(label = label,color=Country), geom_label_repel(aes(label = label,color=Country),
nudge_x = 0, nudge_y=0, max.overlaps=1, direction = "y", nudge_x = 0, nudge_y=0, max.overlaps=1, direction = "y",
...@@ -166,7 +177,7 @@ It is not easy to see the other countries. ggrepel indicates that there is no ro ...@@ -166,7 +177,7 @@ It is not easy to see the other countries. ggrepel indicates that there is no ro
zoomed_df<-df[df$Country!="US",] zoomed_df<-df[df$Country!="US",]
label = if_else(zoomed_df$date == max(zoomed_df$date), as.character(zoomed_df$Country), NA_character_) label = if_else(zoomed_df$date == max(zoomed_df$date), as.character(zoomed_df$Country), NA_character_)
ggplot(zoomed_df, aes(x = date, y = Infected)) + ggplot(zoomed_df, aes(x = date, y = Cases)) +
geom_line(aes(color = Country, group = Country))+ geom_line(aes(color = Country, group = Country))+
geom_label_repel(aes(label = label,color=Country), geom_label_repel(aes(label = label,color=Country),
nudge_x = 200, nudge_y=50, max.overlaps=20, direction = "y", force=10, nudge_x = 200, nudge_y=50, max.overlaps=20, direction = "y", force=10,
...@@ -174,26 +185,27 @@ label = if_else(zoomed_df$date == max(zoomed_df$date), as.character(zoomed_df$Co ...@@ -174,26 +185,27 @@ label = if_else(zoomed_df$date == max(zoomed_df$date), as.character(zoomed_df$Co
``` ```
Something is wrong about the number of cases from China. Let's check the total number of cases in the world and then I'll come back to it Something is off about the number of cases from China. WHO organization indicates a cumulative cases number of nearly 100 millions ([source](https://covid19.who.int/region/wpro/country/cn)). Let's check the total number of cases in the world and then I'll come back to it.
## Total number of case over time ## Total number of case over time
I will use the data frame imported from the website. First, let's change its name. I will use the original data frame imported from the website. First, let's change its name.
```{r} ```{r}
dtpart2<-data dtpart2<-data
``` ```
I need to observe the total number of cases over time, so I will add up all the columns to give me the sum of fall cases at a given day. I need to observe the total number of cases over time, so I will **add up all the columns** to give me the sum of all cases at a given day.
```{r} ```{r}
dtpart2$total_cases<-rowSums(dtpart2) dt_total_cases<-colSums(dtpart2[,-c(1:4)])
``` ```
Now, I will produce the data frame to give to ggplot I also convert the date to a better format, using the same code than above Now, I will produce the data frame to give to ggplot I also convert the date to a better format, using the same code than above
```{r} ```{r}
df2<-data.frame(date=rownames(dtpart2),total_cases=dtpart2$total_cases) df2<-data.frame(date=names(dt_total_cases),total_cases=dt_total_cases)
df2$date<-as.Date(str_sub(df2$date,2,-1),format="%m.%d.%y") df2$date<-as.Date(str_sub(df2$date,2,-1),format="%m.%d.%y")
``` ```
...@@ -208,9 +220,157 @@ ggplot(df2, aes(x = date, y = total_cases)) + ...@@ -208,9 +220,157 @@ ggplot(df2, aes(x = date, y = total_cases)) +
Let's see how it looks after a logarithmic transformation Let's see how it looks after a logarithmic transformation
```{r} ```{r}
ggplot(df2, aes(x = date, y = log(total_cases))) + ggplot(df2, aes(x = date, y = log10(total_cases))) +
geom_line(color="blue") geom_line(color="blue")
``` ```
plot(colSums(data[,-c(1:4)])) I previously suspected there was a problem with China's data. I first checked my analysis by comparing what I obtained with the [map drawn by Hohns Hopkins University](https://coronavirus.jhu.edu/map.html).
They obtained the same total cases number than me, but it seems that they did not use China's data.
I compared their results with WHO data set visible [here](https://covid19.who.int/?mapFilter=cases) and they are indeed quite different.
I will perform the analysis again but using WHO data set obtainable [here](https://covid19.who.int/WHO-COVID-19-global-table-data.csv)
## Same analysis using WHO data
I will **load the data** using the same code than above.
```{r}
data_url= "https://covid19.who.int/WHO-COVID-19-global-data.csv"
data_file = "WHO-COVID-19-global-data.csv"
if (!file.exists(data_file)) {
download.file(data_url, data_file, method="auto")}
data=read.csv("WHO-COVID-19-global-data.csv",sep=",") #sep= allows the loading of "," separated csv
```
I will **select the same countries as before**. Sadly, Hong-Kong is not available in the list from WHO
```{r}
list_of_countries <- c("France","Belgium","Germany","Iran","Italy","Japan","Netherlands","Portugal","Spain","The United Kingdom","United States of America","Republic of Korea","China")
dt_countries<-data[data$Country%in%list_of_countries,]
```
I will **format the date** for R to understand
```{r}
dt_countries$date<-as.Date(dt_countries[,1])#the format of the date in the file is already well formatted but was not recognized as such by R
```
This time, the data frame imported from the source is well formatted and is already usable with ggplot2. **Let's plot the data**
```{r}
ggplot(dt_countries, aes(x = date, y = Cumulative_cases )) +
geom_line(aes(color = Country, group = Country))
```
It is better. Now let's **calculate the total number of cases**. I will replace all the Country name by "worldwide" so ggplot will group everything for plotting
```{r}
dt_worldwide<-data
dt_worldwide$date<-as.Date(dt_worldwide[,1])
dt_worldwide$Country<-"worldwide"
```
The formatting is a little more complex here. To generate the worldwide cumulative deaths numbers, I will first **generate a data frame with the data ordered by date**.
```{r}
df<-dt_worldwide %>%
select(date,Cumulative_cases) %>%
gather(key="date",value="Cumulative_cases")
df<-df[order(df$date),]
```
I will create a function which will return the **worldwide cumulative deaths number of each date**
```{r}
collapse_column<-function(x){
cdf<-df[df$date==x,]
output<-sum(cdf[,2])
return(output)}
```
Then, I will **apply this function to each dates** and return the result in a new data frame.
```{r}
df_total_cases<-data.frame(date=unique(df$date),values=unlist(lapply(unique(df$date),collapse_column)))
```
Now let's see the graph
```{r}
ggplot(df_total_cases, aes(x = date, y = values )) +
geom_line(color = "blue")
```
Let's see how it looks after a logarithmic transformation
```{r}
ggplot(df_total_cases, aes(x = date, y = log10(values))) +
geom_line(color="blue")
```
## What about the deaths ?
Let's have a look to the deaths with a similar approach.
```{r}
df<-dt_worldwide %>%
select(date,Cumulative_deaths) %>%
gather(key="date",value="Cumulative_deaths")
df<-df[order(df$date),]
collapse_column<-function(x){
cdf<-df[df$date==x,]
output<-sum(cdf[,2])
return(output)}
df_total_death<-data.frame(date=unique(df$date),values=unlist(lapply(unique(df$date),collapse_column)))
ggplot(df_total_death, aes(x = date, y = values )) +
geom_line(color = "red")
```
Same graph with logarithmic scale
```{r}
ggplot(df_total_death, aes(x = date, y = log10(values ))) +
geom_line(color = "red")
```
## Cases and deaths correlation
Just for the exercise, I would like to see if cases and death numbers are related. First, I will **build the data frame**
```{r}
dt_cases_and_death<-cbind(df_total_cases,df_total_death[,2])
colnames(dt_cases_and_death)<-c("date","cases","deaths")
```
And now, the graph and the correlation. Due to the high number of samples, I **will apply a Pearson correlation** (default with R) to see if deaths and cases numbers are linked.
```{r}
ggplot(dt_cases_and_death, aes(x = date, y = log10(cases) )) +
geom_line(color = "blue")+
geom_line(aes(x = date, y = log10(deaths)),color="red")+
annotate("text", x=dt_cases_and_death$date[nrow(dt_cases_and_death)/1.5], y = 7.5,label=paste0("correlation= ",round(cor(dt_cases_and_death$cases,dt_cases_and_death$deaths),3)))
```
Without surprise, **cases and deaths are linked**. It is not very scientifically useful, but since I deviated a little from the original exercise, I will stop here
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Name,WHO Region,Cases - cumulative total,Cases - cumulative total per 100000 population,Cases - newly reported in last 7 days,Cases - newly reported in last 7 days per 100000 population,Cases - newly reported in last 24 hours,Deaths - cumulative total,Deaths - cumulative total per 100000 population,Deaths - newly reported in last 7 days,Deaths - newly reported in last 7 days per 100000 population,Deaths - newly reported in last 24 hours
Global,,766895075,9838.874900615549,265656,3.4082317592115516,0,6935889,88.98393850756634,1656,0.02124564020106576,0,
United States of America,Americas,103436829,31249.547,0,0,0,1127152,340.527,0,0,0
China,Western Pacific,99261812,6746.598,3859,0.262,0,121144,8.234,57,0.004,0
India,South-East Asia,44987339,3259.942,4187,0.303,0,531843,38.539,49,0.004,0
France,Europe,39010097,59979.215,15577,23.95,0,163437,251.289,76,0.117,0
Germany,Europe,38423300,46200.336,2641,3.176,0,174032,209.257,10,0.012,0
Brazil,Americas,37553337,17667.219,41416,19.484,0,702421,330.459,305,0.143,0
Japan,Western Pacific,33803572,26727.165,0,0,0,74694,59.058,0,0,0
Republic of Korea,Western Pacific,31548083,61534.2,82976,161.844,0,34687,67.657,53,0.103,0
Italy,Europe,25842595,43329.896,5030,8.434,0,190242,318.976,50,0.084,0
The United Kingdom,Europe,24611066,36253.52,1529,2.252,0,225852,332.693,0,0,0
Russian Federation,Europe,22917873,15704.223,17118,11.73,0,398919,273.355,183,0.125,0
Türkiye,Europe,17004677,20162.278,0,0,0,101419,120.252,0,0,0
Spain,Europe,13868227,29299.516,22424,47.375,0,121213,256.088,296,0.625,0
Viet Nam,Western Pacific,11602738,11919.979,8119,8.341,0,43203,44.384,2,0.002,0
Australia,Western Pacific,11339196,44467.638,0,0,0,20721,81.259,0,0,0
Argentina,Americas,10044957,22225.434,0,0,0,130472,288.682,0,0,0
Netherlands,Europe,8610372,49463.335,0,0,0,22992,132.08,0,0,0
Mexico,Americas,7611736,5903.648,7865,6.1,0,334079,259.111,66,0.051,0
Iran (Islamic Republic of),Eastern Mediterranean,7611138,9061.639,243,0.289,0,146230,174.098,13,0.015,0
Indonesia,South-East Asia,6803504,2487.355,3744,1.369,0,161701,59.118,55,0.02,0
Poland,Europe,6516424,17167.396,534,1.407,0,119600,315.084,6,0.016,0
Colombia,Americas,6366777,12512.609,783,1.539,0,142741,280.528,14,0.028,0
Austria,Europe,6074437,68243.943,1841,20.683,0,22488,252.644,1,0.011,0
Greece,Europe,6059649,56534.144,0,0,0,36917,344.421,0,0,0
Portugal,Europe,5585859,54253.189,813,7.896,0,26701,259.336,16,0.155,0
Ukraine,Europe,5549708,12689.758,0,0,0,112315,256.815,0,0,0
Chile,Americas,5286815,27656.201,717,3.751,0,61491,321.67,33,0.173,0
Malaysia,Western Pacific,5094448,15740.123,6439,19.894,0,37070,114.534,24,0.074,0
Israel,Europe,4825362,55748.82,399,4.61,0,12524,144.693,3,0.035,0
Belgium,Europe,4798041,41640.842,3,0.026,0,34310,297.767,0,0,0
Thailand,South-East Asia,4738988,6789.383,2632,3.771,0,34053,48.787,64,0.092,0
Canada,Americas,4669364,12371.748,0,0,0,52301,138.574,0,0,0
Czechia,Europe,4641759,43405.512,249,2.328,0,42796,400.189,2,0.019,0
Peru,Americas,4505860,13665.777,640,1.941,0,220561,668.937,45,0.136,0
Switzerland,Europe,4404916,50896.712,0,0,0,14012,161.902,0,0,0
Philippines,Western Pacific,4127628,3766.734,9615,8.774,0,66466,60.655,13,0.012,0
South Africa,Africa,4072533,6866.672,0,0,0,102595,172.985,0,0,0
Denmark,Europe,3413431,58622.187,342,5.874,0,8673,148.95,23,0.395,0
Romania,Europe,3402356,17602.486,2379,12.308,0,68172,352.696,29,0.15,0
Sweden,Europe,2709041,26231.108,0,0,0,24274,235.04,0,0,0
Serbia,Europe,2540323,36674.335,1172,16.92,0,18047,260.542,5,0.072,0
Iraq,Eastern Mediterranean,2465545,6129.767,0,0,0,25375,63.087,0,0,0
Singapore,Western Pacific,2438690,41684.572,0,0,0,1722,29.434,0,0,0
New Zealand,Western Pacific,2295559,47603.652,12698,263.322,0,2893,59.993,43,0.892,0
Hungary,Europe,2202491,22544.502,0,0,0,48781,499.318,0,0,0
Bangladesh,South-East Asia,2038708,1237.911,169,0.103,0,29446,17.88,0,0,0
Slovakia,Europe,1866814,34204.057,69,1.264,0,21167,387.825,0,0,0
Georgia,Europe,1842046,46176.114,0,0,0,17070,427.908,0,0,0
Jordan,Eastern Mediterranean,1746997,17122.161,0,0,0,14122,138.408,0,0,0
Ireland,Europe,1711691,34479.035,118,2.377,0,8935,179.98,30,0.604,0
Pakistan,Eastern Mediterranean,1580631,715.566,0,0,0,30656,13.878,0,0,0
Kazakhstan,Europe,1502857,8003.837,0,0,0,19072,101.573,0,0,0
Norway,Europe,1484448,27655.815,156,2.906,0,5495,102.374,0,0,0
Finland,Europe,1478305,26755.238,172,3.113,0,9589,173.547,0,0,0
Slovenia,Europe,1344104,64131.352,149,7.109,0,9364,446.785,0,0,0
Lithuania,Europe,1320046,47244.219,327,11.703,0,9679,346.41,7,0.251,0
Bulgaria,Europe,1306862,18799.761,604,8.689,0,38372,551.997,12,0.173,0
Morocco,Eastern Mediterranean,1274180,3452.074,0,0,0,16297,44.153,0,0,0
Croatia,Europe,1273671,31385.392,163,4.017,0,18246,449.612,16,0.394,0
Guatemala,Americas,1251086,6983.234,818,4.566,0,20199,112.746,0,0,0
Lebanon,Eastern Mediterranean,1237556,18131.506,0,0,0,10914,159.902,0,0,0
Costa Rica,Americas,1230552,24156.331,0,0,0,9366,183.859,0,0,0
Bolivia (Plurinational State of),Americas,1198404,10266.443,421,3.607,0,22383,191.75,1,0.009,0
Tunisia,Eastern Mediterranean,1153161,9757.155,0,0,0,29412,248.862,0,0,0
Puerto Rico,Americas,1122076,39221.729,0,0,0,5930,207.281,15,0.524,0
Cuba,Americas,1113830,9833.74,236,2.084,0,8530,75.309,0,0,0
United Arab Emirates,Eastern Mediterranean,1066641,10784.607,609,6.157,0,2349,23.75,0,0,0
Ecuador,Americas,1061766,6018.04,0,0,0,36019,204.154,0,0,0
Panama,Americas,1038642,24071.798,664,15.389,0,8623,199.849,1,0.023,0
Uruguay,Americas,1037893,29878.344,0,0,0,7625,219.505,0,0,0
Mongolia,Western Pacific,1008655,30767.717,0,0,0,2136,65.156,0,0,0
Nepal,South-East Asia,1003307,3443.435,47,0.161,0,12031,41.291,0,0,0
Belarus,Europe,994037,10519.666,0,0,0,7118,75.328,0,0,0
Latvia,Europe,977891,51260.88,27,1.415,0,6368,333.809,7,0.367,0
Saudi Arabia,Eastern Mediterranean,841469,2417.051,0,0,0,9646,27.707,0,0,0
Azerbaijan,Europe,831735,8203.182,57,0.562,0,10272,101.31,4,0.039,0
Paraguay,Americas,735759,10315.529,0,0,0,19880,278.723,0,0,0
"occupied Palestinian territory, including east Jerusalem",Eastern Mediterranean,703228,13784.962,0,0,0,5708,111.891,0,0,0
Bahrain,Eastern Mediterranean,696614,40939.365,0,0,0,1536,90.269,0,0,0
Sri Lanka,South-East Asia,672380,3140.019,63,0.294,0,16868,78.774,12,0.056,0
Kuwait,Eastern Mediterranean,665909,15592.973,24,0.562,0,2570,60.179,0,0,0
Dominican Republic,Americas,661176,6094.962,73,0.673,0,4384,40.413,0,0,0
Cyprus,Europe,660854,74420.076,0,0,0,1364,153.603,0,0,0
Myanmar,South-East Asia,638116,1172.796,494,0.908,0,19494,35.828,0,0,0
Republic of Moldova,Europe,620519,15382.367,0,0,0,12118,300.399,0,0,0
Estonia,Europe,618608,46547.718,0,0,0,3001,225.813,0,0,0
Venezuela (Bolivarian Republic of),Americas,552695,1943.649,0,0,0,5856,20.594,0,0,0
Egypt,Eastern Mediterranean,516023,504.252,0,0,0,24830,24.264,0,0,0
Qatar,Eastern Mediterranean,511932,17768.92,788,27.351,0,690,23.95,0,0,0
Libya,Eastern Mediterranean,507255,7382.236,0,0,0,6437,93.68,0,0,0
Ethiopia,Africa,500872,435.679,1,0.001,0,7574,6.588,0,0,0
Réunion,Africa,494595,55242.753,0,0,0,921,102.869,0,0,0
Honduras,Americas,472619,4771.709,25,0.252,0,11116,112.231,2,0.02,0
Armenia,Europe,449169,15158.067,0,0,0,8750,295.285,0,0,0
Bosnia and Herzegovina,Europe,402940,12281.704,22,0.671,0,16346,498.23,0,0,0
Oman,Eastern Mediterranean,399449,7822.171,0,0,0,4628,90.627,0,0,0
North Macedonia,Europe,348276,16716.874,0,0,0,9677,464.486,0,0,0
Zambia,Africa,343995,1871.17,0,0,0,4058,22.074,0,0,0
Kenya,Africa,343074,638.024,0,0,0,5688,10.578,0,0,0
Albania,Europe,334090,11609.215,0,0,0,3604,125.235,0,0,0
Botswana,Africa,329862,14026.969,0,0,0,2797,118.939,0,0,0
Luxembourg,Europe,319959,51102.845,0,0,0,1232,196.771,0,0,0
Mauritius,Africa,304233,23922.052,0,0,0,1050,82.562,0,0,0
Brunei Darussalam,Western Pacific,303719,69424.818,0,0,0,161,36.802,0,0,0
Montenegro,Europe,291830,46465.158,0,0,0,2827,450.115,0,0,0
Kosovo[1],Europe,273889,15252.781,7,0.39,0,3206,178.541,0,0,0
Algeria,Africa,271820,619.871,3,0.007,0,6881,15.692,0,0,0
Nigeria,Africa,266675,129.366,0,0,0,3155,1.531,0,0,0
Zimbabwe,Africa,264848,1781.937,0,0,0,5690,38.283,0,0,0
Uzbekistan,Europe,253662,757.897,95,0.284,0,1637,4.891,0,0,0
Mozambique,Africa,233417,746.805,0,0,0,2243,7.176,0,0,0
Martinique,Americas,229975,61283.36,75,19.986,0,1102,293.659,2,0.533,0
Afghanistan,Eastern Mediterranean,220059,565.292,677,1.739,0,7913,20.327,0,0,0
Lao People's Democratic Republic,Western Pacific,218196,2999.027,14,0.192,0,671,9.223,0,0,0
Iceland,Europe,209191,57448.906,0,0,0,260,71.402,0,0,0
Kyrgyzstan,Europe,206890,3171.121,0,0,0,2991,45.845,0,0,0
Guadeloupe,Americas,202836,50693.285,183,45.736,0,1017,254.171,0,0,0
El Salvador,Americas,201785,3110.987,0,0,0,4230,65.215,0,0,0
Trinidad and Tobago,Americas,191496,13683.29,0,0,0,4390,313.686,0,0,0
Maldives,South-East Asia,186625,34525.404,36,6.66,0,315,58.275,1,0.185,0
Ghana,Africa,171653,552.42,0,0,0,1462,4.705,0,0,0
Namibia,Africa,171310,6742.086,0,0,0,4091,161.006,0,0,0
Uganda,Africa,170775,373.352,0,0,0,3632,7.94,0,0,0
Jamaica,Americas,154938,5232.329,64,2.161,0,3545,119.716,9,0.304,0
Cambodia,Western Pacific,138740,829.836,4,0.024,0,3056,18.279,0,0,0
Rwanda,Africa,133194,1028.349,0,0,0,1468,11.334,0,0,0
Cameroon,Africa,125036,471.019,0,0,0,1972,7.429,0,0,0
Malta,Europe,118631,23054.664,0,0,0,835,162.273,0,0,0
Barbados,Americas,107794,37509.874,0,0,0,593,206.351,0,0,0
Angola,Africa,105384,320.645,0,0,0,1934,5.884,0,0,0
French Guiana,Americas,98041,32824.542,0,0,0,413,138.274,0,0,0
Democratic Republic of the Congo,Africa,96652,107.917,0,0,0,1467,1.638,0,0,0
Senegal,Africa,88997,531.518,0,0,0,1971,11.771,0,0,0
Malawi,Africa,88638,463.347,0,0,0,2686,14.041,0,0,0
Côte d’Ivoire,Africa,88330,334.859,0,0,0,834,3.162,0,0,0
Suriname,Americas,82513,14065.547,18,3.068,0,1405,239.503,1,0.17,0
New Caledonia,Western Pacific,80058,28041.527,0,0,0,314,109.983,0,0,0
French Polynesia,Western Pacific,78569,27969.656,0,0,0,649,231.036,0,0,0
Eswatini,Africa,74670,6436.159,0,0,0,1425,122.827,0,0,0
Guyana,Americas,73207,9307.331,0,0,0,1298,165.024,0,0,0
Belize,Americas,70782,17801.06,0,0,0,688,173.026,0,0,0
Fiji,Western Pacific,68921,7688.258,0,0,0,883,98.5,0,0,0
Madagascar,Africa,68266,246.528,0,0,0,1424,5.142,0,0,0
Jersey,Europe,66391,61589.484,0,0,0,161,149.356,0,0,0
Sudan,Eastern Mediterranean,63993,145.939,0,0,0,5046,11.508,0,0,0
Cabo Verde,Africa,63820,11478.686,16,2.878,0,414,74.462,0,0,0
Mauritania,Africa,63669,1369.327,0,0,0,997,21.442,0,0,0
Bhutan,South-East Asia,62670,8122,2,0.259,0,21,2.722,0,0,0
Syrian Arab Republic,Eastern Mediterranean,57423,328.119,0,0,0,3163,18.074,0,0,0
Burundi,Africa,53751,452.039,0,0,0,15,0.126,0,0,0
Guam,Western Pacific,51427,30470.745,82,48.585,0,413,244.704,0,0,0
Seychelles,Africa,50937,51793.141,0,0,0,172,174.891,0,0,0
Gabon,Africa,48992,2201.162,0,0,0,307,13.793,0,0,0
Andorra,Europe,48015,62143.273,0,0,0,159,205.785,0,0,0
Papua New Guinea,Western Pacific,46864,523.794,0,0,0,670,7.489,0,0,0
Curaçao,Americas,45812,27918.315,0,0,0,302,184.042,0,0,0
Aruba,Americas,44180,41380.215,0,0,0,288,269.749,0,0,0
United Republic of Tanzania,Africa,43078,72.116,0,0,0,846,1.416,0,0,0
Mayotte,Africa,42027,15404.945,0,0,0,187,68.545,0,0,0
Togo,Africa,39491,477.018,0,0,0,290,3.503,0,0,0
Guinea,Africa,38563,293.639,0,0,0,468,3.564,0,0,0
Bahamas,Americas,38084,9684.572,0,0,0,844,214.625,0,0,0
Isle of Man,Europe,38008,44698.466,0,0,0,116,136.419,0,0,0
Guernsey,Europe,35326,54796.178,0,0,0,67,103.928,0,0,0
Faroe Islands,Europe,34658,70926.021,0,0,0,28,57.301,0,0,0
Lesotho,Africa,34490,1609.99,0,0,0,706,32.956,0,0,0
Haiti,Americas,34237,300.258,9,0.079,0,860,7.542,0,0,0
Mali,Africa,33148,163.687,1,0.005,0,743,3.669,0,0,0
Cayman Islands,Americas,31472,47888.01,0,0,0,37,56.299,0,0,0
Saint Lucia,Americas,30052,16365.785,0,0,0,409,222.734,0,0,0
Benin,Africa,28014,231.078,0,0,0,163,1.345,0,0,0
Somalia,Eastern Mediterranean,27334,171.985,0,0,0,1361,8.563,0,0,0
Micronesia (Federated States of),Western Pacific,26453,22998.009,0,0,0,65,56.51,0,0,0
Congo,Africa,25195,456.589,0,0,0,389,7.05,0,0,0
United States Virgin Islands,Americas,25046,23984.678,52,49.797,0,131,125.449,0,0,0
San Marino,Europe,24263,71492.133,9,26.519,0,125,368.319,0,0,0
Timor-Leste,South-East Asia,23444,1778.155,1,0.076,0,138,10.467,0,0,0
Burkina Faso,Africa,22056,105.515,0,0,0,396,1.894,0,0,0
Solomon Islands,Western Pacific,21611,3146.237,0,0,0,153,22.275,0,0,0
Liechtenstein,Europe,21468,55405.58,0,0,0,87,224.534,0,0,0
Gibraltar,Europe,20550,60995.518,0,0,0,113,335.401,0,0,0
Grenada,Americas,19693,17501.311,0,0,0,238,211.512,0,0,0
Bermuda,Americas,18860,30285.999,0,0,0,165,264.962,0,0,0
South Sudan,Africa,18368,164.092,0,0,0,138,1.233,0,0,0
Tajikistan,Europe,17786,186.482,0,0,0,125,1.311,0,0,0
Equatorial Guinea,Africa,17130,1220.968,0,0,0,183,13.044,0,0,0
Tonga,Western Pacific,16817,15910.876,0,0,0,12,11.353,0,0,0
Monaco,Europe,16789,42781.062,8,20.385,0,67,170.727,0,0,0
Samoa,Western Pacific,16763,8448.497,0,0,0,31,15.624,0,0,0
Marshall Islands,Western Pacific,16081,27166.605,0,0,0,17,28.719,0,0,0
Dominica,Americas,15760,21891.625,0,0,0,74,102.791,0,0,0
Nicaragua,Americas,15720,237.299,7,0.106,0,245,3.698,0,0,0
Djibouti,Eastern Mediterranean,15690,1588.057,0,0,0,189,19.13,0,0,0
Central African Republic,Africa,15367,318.173,0,0,0,113,2.34,0,0,0
Northern Mariana Islands (Commonwealth of the),Western Pacific,13896,24143.023,0,0,0,41,71.234,0,0,0
Gambia,Africa,12626,522.455,0,0,0,372,15.393,0,0,0
Saint Martin,Americas,12303,31824.413,3,7.76,0,46,118.989,0,0,0
Vanuatu,Western Pacific,12016,3912.159,0,0,0,14,4.558,0,0,0
Greenland,Europe,11971,21086.099,0,0,0,21,36.99,0,0,0
Yemen,Eastern Mediterranean,11945,40.049,0,0,0,2159,7.239,0,0,0
Sint Maarten,Americas,11030,25721.748,0,0,0,92,214.542,0,0,0
Eritrea,Africa,10189,287.304,0,0,0,103,2.904,0,0,0
Bonaire,Americas,9855,47119.292,0,0,0,33,157.781,0,0,0
Saint Vincent and the Grenadines,Americas,9631,8681.269,0,0,0,124,111.772,0,0,0
Guinea-Bissau,Africa,9614,488.516,0,0,0,177,8.994,0,0,0
Niger,Africa,9513,39.299,0,0,0,315,1.301,0,0,0
Comoros,Africa,9109,1047.492,0,0,0,160,18.399,0,0,0
Antigua and Barbuda,Americas,9106,9298.573,0,0,0,146,149.088,0,0,0
American Samoa,Western Pacific,8331,15093.212,0,0,0,34,61.598,0,0,0
Liberia,Africa,8090,159.955,0,0,0,294,5.813,0,0,0
Sierra Leone,Africa,7762,97.305,0,0,0,125,1.567,0,0,0
Chad,Africa,7698,46.865,0,0,0,194,1.181,0,0,0
British Virgin Islands,Americas,7305,24159.143,0,0,0,64,211.661,0,0,0
Cook Islands,Western Pacific,7106,40457.754,0,0,0,2,11.387,0,0,0
Saint Kitts and Nevis,Americas,6600,12407.881,0,0,0,46,86.479,0,0,0
Turks and Caicos Islands,Americas,6588,17015.342,0,0,0,38,98.146,0,0,0
Sao Tome and Principe,Africa,6575,3000.105,0,0,0,80,36.503,0,0,0
Palau,Western Pacific,6000,33163.829,0,0,0,9,49.746,0,0,0
Saint Barthélemy,Americas,5494,55579.16,8,80.931,0,5,50.582,0,0,0
Nauru,Western Pacific,5393,49778.475,0,0,0,1,9.23,0,0,0
Kiribati,Western Pacific,5027,4208.491,2,1.674,0,24,20.092,0,0,0
Anguilla,Americas,3904,26023.197,0,0,0,12,79.989,0,0,0
Wallis and Futuna,Western Pacific,3508,31193.313,0,0,0,7,62.244,0,0,0
Saint Pierre and Miquelon,Americas,3426,59119.931,0,0,0,2,34.513,0,0,0
Tuvalu,Western Pacific,2779,23566.825,0,0,0,0,0,0,0,0
"Saint Helena, Ascension and Tristan da Cunha",Africa,2166,35677.813,0,0,0,0,0,0,0,0
Falkland Islands (Malvinas),Americas,1923,55211.025,0,0,0,0,0,0,0,0
Montserrat,Americas,1403,28065.613,0,0,0,8,160.032,0,0,0
Sint Eustatius,Americas,1217,38770.309,0,0,0,6,191.144,0,0,0
Saba,Americas,813,42058.976,0,0,0,2,103.466,0,0,0
Niue,Western Pacific,802,49567.367,0,0,0,0,0,0,0,0
Other,Other,764,,0,,0,13,,0,,0
Holy See,Europe,26,3213.844,0,0,0,0,0,0,0,0
Tokelau,Western Pacific,5,370.37,0,0,0,0,0,0,0,0
Pitcairn Islands,Western Pacific,4,8000,0,0,0,0,0,0,0,0
Democratic People's Republic of Korea,South-East Asia,0,0,0,0,0,0,0,0,0,0
Turkmenistan,Europe,0,0,0,0,0,0,0,0,0,0
---
title: "MOOC_COVID_Analysis"
author: "VB (feb2301522924f68234e7a552680f397)"
date: "2023-05-24"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Analyse de l'incidence du syndrôme grippal avec une copie locale des données
Le but est ici de reproduire des graphes semblables à ceux du [South China Morning Post]<https://www.scmp.com/> (SCMP), sur la page [The Coronavirus Pandemic](https://www.scmp.com/coronavirus?src=homepage_covid_widget) et qui montrent pour différents pays le nombre cumulé (c'est-à-dire le nombre total de cas depuis le début de l'épidémie) de personnes atteintes de la maladie à coronavirus 2019.
Les données que nous utiliserons dans un premier temps sont compilées par le [Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)](https://systems.jhu.edu/) et sont mises à disposition sur [GitHub](https://github.com/CSSEGISandData/COVID-19). C'est plus particulièrement sur les données `time_series_covid19_confirmed_global.csv` (des suites chronologiques au format csv) disponibles à l'adresse : <https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv>, que nous allons nous concentrer.
Vous commencerez par télécharger les données pour créer un graphe montrant l'évolution du nombre de cas cumulé au cours du temps pour les pays suivants : la Belgique (*Belgium*), la Chine - toutes les provinces sauf Hong-Kong (*China*), Hong Kong (*China, Hong-Kong*), la France métropolitaine (*France*), l'Allemagne (*Germany*), l'Iran (*Iran*), l'Italie (*Italy*), le Japon (*Japan*), la Corée du Sud (*Korea, South*), la Hollande sans les colonies (*Netherlands*), le Portugal (*Portugal*), l'Espagne (*Spain*), le Royaume-Unis sans les colonies (*United Kingdom*), les États-Unis (*US*).
Le nom entre parenthèses est le nom du « pays » tel qu'il apparaît dans le fichier `time_series_covid19_confirmed_global.csv`. Les données de la Chine apparaissent par province et nous avons séparé Hong-Kong, non pour prendre parti dans les différences entre cette province et l'état chinois, mais parce que c'est ainsi qu'apparaissent les données sur le site du SCMP. Les données pour la France, la Hollande et le Royaume-Uni excluent les territoires d'outre-mer et autres « résidus coloniaux ».
Ensuite vous ferez un graphe avec la date en abscisse et le nombre cumulé de cas à cette date en ordonnée. Nous vous proposons de faire deux versions de ce graphe, une avec une échelle linéaire et une avec une échelle logarithmique.
**The rest of this RMarkdown file will be in english.**
## Installing and loading required packages
In this analysis, I will use 3 packages : **tidyverse/stringr** to format the data and **ggplot2/ggrepel** for the graphical representation. The next R lines will detect if the packages are installed, and if not, it should install them.
```{r}
list.of.packages <- c("ggplot2", "tidyverse", "ggrepel","stringr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
```
## Downloading and loading the data in R
The next chunk of code will load the data in R. If no local copy of the csv file is present, it will be downloaded.
```{r}
data_url= "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
data_file = "time_series_covid19_confirmed_global.csv"
if (!file.exists(data_file)) {
download.file(data_url, data_file, method="auto")}
data=read.csv("time_series_covid19_confirmed_global.csv",sep=",") #sep= allows the loading of "," separated csv
```
I then check that the loading is correct, since the csv is "," separated.
```{r}
head(colnames(data)) # Allows us to see that the columns were well separated
head(data$Country.Region)
```
It is correctly loaded, but the data frame will need some manipulations before I can use it with ggplot, the package I will use for the graphical representation.
## Building the data frame to show the number of cases over time
Since I want to exclude all "colonial territories" except China's regions, I will exclude all of the non-metropolitan territories by **creating a data frame from only the rows containing an empty "Province.State"**. The only purpose of this data frame is to simplify the selection of the countries that I want for the analysis.
```{r}
data_noProvince<-data[(data$Province.State==""),]
```
I will **check that the selection is correct** by comparing the number of rows in the two data frame and verifying that France is only shown once.
```{r}
nrow(data_noProvince)
nrow(data)
head(data_noProvince[data_noProvince$Country.Region=="France",c(1,2)])
```
Now that the separation is complete, **I will select the countries I want**. China and Hong Kong are exceptions. I will need to add up the numbers of all the regions of China, except Hong-Kong since it will have it's own category.
```{r}
list_of_countries <- c("France","Belgium","Germany","Iran","Italy","Japan","Netherlands","Portugal","Spain","United Kingdom","US","Korea, South")
dt_countries<-data_noProvince[data_noProvince$Country.Region%in%list_of_countries,] # Here, I use the prepared data frame without the non-metropolitan territories
china_data<-data[data$Country.Region=="China"&data$Province.State!="Hong Kong",] #I get all the numbers associated with China, except Hong-Kong
hong_kong<-data[data$Country.Region=="China"&data$Province.State=="Hong Kong",]
hong_kong$Country.Region<-"Hong Kong" #I isolated the numbers associated with Hong-Kong
```
**For China, I will add up the infected-per-day numbers**, which will give me a single row for the final data frame.
```{r}
china<-cbind(china_data[1,c(1,2)],t(data.frame(list(colSums(china_data[,-c(1,2)])))))
#cbind() allows the fusion of the columns of multiple data frame with a same number of rows.
#t() is used to transpose the axes of the data frame, for formatting purpose for tidyverse. colSums provides the sums of all the columns
```
Finally, I will **assemble the 3 data frames** (China, Hong-Kong, and the other countries). I will put the country names as row_names, for formatting purpose.
```{r}
dt_countries<-rbind(china, hong_kong,dt_countries) #rbind is similar to cbind, but to fuse rows
colnames(dt_countries)<-colnames(data) # to conserve original column names
rownames(dt_countries)<-dt_countries$Country.Region
```
To allow the graphical representation, **I will remove all the unnecessary variables** (State, Latitude and Longitude). **I transpose the data frame so that countries are now the variables and the date are discriminating values**. Again, it is mostly for formatting.
```{r}
dt_countries_onlydata<-as.data.frame(t(dt_countries[,c(5:ncol(dt_countries))]))# remove unnecessary variables
dt_countries_onlydata$date<-rownames(dt_countries_onlydata) # adding dates as variable
```
Using tidyverse, I will **prepare the data for processing by ggplot**.
To be more precise, it will associate a key (here, countries) to each value (here, cases). Values and keys are ordered with the "date" variable. When ggplot will plot the data, it will plot the values to lines specified by their keys i.e. plot the number of cases from France to a line "France". If it is not done this way, ggplot will not understand that I want to plot multiple lines and the output is unreadable.
```{r}
library(tidyverse,quietly=TRUE)
df<-dt_countries_onlydata %>%
select(colnames(dt_countries_onlydata)) %>%
gather(key="Country",value="Cases",-date)
```
We can compare the two data frames to see the different layouts.
```{r}
head(dt_countries_onlydata)
head(df)
```
I will finally convert the obscure date format to one that R can recognize
```{r}
library(stringr,quietly=TRUE)
df$date<-as.Date(str_sub(df$date,2,-1),format="%m.%d.%y")# str_sub is used to remove the "X" before the date
```
Now, the data are ready to be drawn by ggplot
## Graphical representation
```{r}
library(ggplot2,quietly=TRUE)
library(ggrepel,quietly=TRUE)#package used to ad the label shown below
ggplot(df, aes(x = date, y = Cases)) +
geom_line(aes(color = Country, group = Country))
```
I will add a label to identify the most infected country
```{r}
label = if_else(df$date == max(df$date), as.character(df$Country), NA_character_)
ggplot(df, aes(x = date, y = Cases)) +
geom_line(aes(color = Country, group = Country))+
geom_label_repel(aes(label = label,color=Country),
nudge_x = 0, nudge_y=0, max.overlaps=1, direction = "y",
na.rm = TRUE,show.legend=F)
```
It is not easy to see the other countries. ggrepel indicates that there is no room to put the labels of the other countries. Let's mask the US and see how it looks like now
```{r}
zoomed_df<-df[df$Country!="US",]
label = if_else(zoomed_df$date == max(zoomed_df$date), as.character(zoomed_df$Country), NA_character_)
ggplot(zoomed_df, aes(x = date, y = Cases)) +
geom_line(aes(color = Country, group = Country))+
geom_label_repel(aes(label = label,color=Country),
nudge_x = 200, nudge_y=50, max.overlaps=20, direction = "y", force=10,
na.rm = TRUE,show.legend=F)+ theme(legend.position = "none")
```
Something is off about the number of cases from China. WHO organization indicates a cumulative cases number of nearly 100 millions ([source](https://covid19.who.int/region/wpro/country/cn)). Let's check the total number of cases in the world and then I'll come back to it.
## Total number of case over time
I will use the original data frame imported from the website. First, let's change its name.
```{r}
dtpart2<-data
```
I need to observe the total number of cases over time, so I will **add up all the columns** to give me the sum of all cases at a given day.
```{r}
dt_total_cases<-colSums(dtpart2[,-c(1:4)])
```
Now, I will produce the data frame to give to ggplot I also convert the date to a better format, using the same code than above
```{r}
df2<-data.frame(date=names(dt_total_cases),total_cases=dt_total_cases)
df2$date<-as.Date(str_sub(df2$date,2,-1),format="%m.%d.%y")
```
Let's do the graphical representation
```{r}
ggplot(df2, aes(x = date, y = total_cases)) +
geom_line(color="blue")
```
Let's see how it looks after a logarithmic transformation
```{r}
ggplot(df2, aes(x = date, y = log10(total_cases))) +
geom_line(color="blue")
```
I previously suspected there was a problem with China's data. I first checked my analysis by comparing what I obtained with the [map drawn by Hohns Hopkins University](https://coronavirus.jhu.edu/map.html).
They obtained the same total cases number than me, but it seems that they did not use China's data.
I compared their results with WHO data set visible [here](https://covid19.who.int/?mapFilter=cases) and they are indeed quite different.
I will perform the analysis again but using WHO data set obtainable [here](https://covid19.who.int/WHO-COVID-19-global-table-data.csv)
## Same analysis using WHO data
I will **load the data** using the same code than above.
```{r}
data_url= "https://covid19.who.int/WHO-COVID-19-global-data.csv"
data_file = "WHO-COVID-19-global-data.csv"
if (!file.exists(data_file)) {
download.file(data_url, data_file, method="auto")}
data=read.csv("WHO-COVID-19-global-data.csv",sep=",") #sep= allows the loading of "," separated csv
```
I will **select the same countries as before**. Sadly, Hong-Kong is not available in the list from WHO
```{r}
list_of_countries <- c("France","Belgium","Germany","Iran","Italy","Japan","Netherlands","Portugal","Spain","The United Kingdom","United States of America","Republic of Korea","China")
dt_countries<-data[data$Country%in%list_of_countries,]
```
I will **format the date** for R to understand
```{r}
dt_countries$date<-as.Date(dt_countries[,1])#the format of the date in the file is already well formatted but was not recognized as such by R
```
This time, the data frame imported from the source is well formatted and is already usable with ggplot2. **Let's plot the data**
```{r}
ggplot(dt_countries, aes(x = date, y = Cumulative_cases )) +
geom_line(aes(color = Country, group = Country))
```
It is better. Now let's **calculate the total number of cases**. I will replace all the Country name by "worldwide" so ggplot will group everything for plotting
```{r}
dt_worldwide<-data
dt_worldwide$date<-as.Date(dt_worldwide[,1])
dt_worldwide$Country<-"worldwide"
```
The formatting is a little more complex here. To generate the worldwide cumulative deaths numbers, I will first **generate a data frame with the data ordered by date**.
```{r}
df<-dt_worldwide %>%
select(date,Cumulative_cases) %>%
gather(key="date",value="Cumulative_cases")
df<-df[order(df$date),]
```
I will create a function which will return the **worldwide cumulative deaths number of each date**
```{r}
collapse_column<-function(x){
cdf<-df[df$date==x,]
output<-sum(cdf[,2])
return(output)}
```
Then, I will **apply this function to each dates** and return the result in a new data frame.
```{r}
df_total_cases<-data.frame(date=unique(df$date),values=unlist(lapply(unique(df$date),collapse_column)))
```
Now let's see the graph
```{r}
ggplot(df_total_cases, aes(x = date, y = values )) +
geom_line(color = "blue")
```
Let's see how it looks after a logarithmic transformation
```{r}
ggplot(df_total_cases, aes(x = date, y = log10(values))) +
geom_line(color="blue")
```
## What about the deaths ?
Let's have a look to the deaths with a similar approach.
```{r}
df<-dt_worldwide %>%
select(date,Cumulative_deaths) %>%
gather(key="date",value="Cumulative_deaths")
df<-df[order(df$date),]
collapse_column<-function(x){
cdf<-df[df$date==x,]
output<-sum(cdf[,2])
return(output)}
df_total_death<-data.frame(date=unique(df$date),values=unlist(lapply(unique(df$date),collapse_column)))
ggplot(df_total_death, aes(x = date, y = values )) +
geom_line(color = "red")
```
Same graph with logarithmic scale
```{r}
ggplot(df_total_death, aes(x = date, y = log10(values ))) +
geom_line(color = "red")
```
## Cases and deaths correlation
Just for the exercise, I would like to see if cases and death numbers are related. First, I will **build the data frame**
```{r}
dt_cases_and_death<-cbind(df_total_cases,df_total_death[,2])
colnames(dt_cases_and_death)<-c("date","cases","deaths")
```
And now, the graph and the correlation. Due to the high number of samples, I **will apply a Pearson correlation** (default with R) to see if deaths and cases numbers are linked.
```{r}
ggplot(dt_cases_and_death, aes(x = date, y = log10(cases) )) +
geom_line(color = "blue")+
geom_line(aes(x = date, y = log10(deaths)),color="red")+
annotate("text", x=dt_cases_and_death$date[nrow(dt_cases_and_death)/1.5], y = 7.5,label=paste0("correlation= ",round(cor(dt_cases_and_death$cases,dt_cases_and_death$deaths),3)))
```
Without surprise, **cases and deaths are linked**. It is not very scientifically useful, but since I deviated a little from the original exercise, I will stop here
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment