Worldwide covid evolution in February 2021
+Table of Contents
+1 Dataset
++We want to introduce here this dataset, taken on the 03/03/2021. We +chose to study the per country daily dataset so we have some +preprocessing work to do, and more fine grained statistical analysis. +
+ +import pandas as pd +data = pd.read_csv('./coronavirus.politologue.com-pays-2021-03-03.csv', skiprows=7, sep=';') +data.head() ++
+ Date Pays ... TauxGuerison TauxInfection +0 2021-03-03 Andorre ... 96.27 2.72 +1 2021-03-03 Émirats Arabes Unis ... 96.78 2.90 +2 2021-03-03 Afghanistan ... 88.50 7.11 +3 2021-03-03 Antigua-et-Barbuda ... 39.92 58.26 +4 2021-03-03 Albanie ... 65.40 32.91 + +[5 rows x 8 columns] ++ + +
+Let's see how big the data is, and the date range it covers. +
+print(data.shape) +data['Date'] = pd.to_datetime(data['Date']) +print(min(data['Date'])) +print(max(data['Date'])) ++
+(6293, 8) +2021-02-01 00:00:00 +2021-03-03 00:00:00 ++ + +
+So it's a pretty small dataset, so the computations should be +fast. Let's look at the columns +
+print(data.columns)
+
++Index(['Date', 'Pays', 'Infections', 'Deces', 'Guerisons', 'TauxDeces', + 'TauxGuerison', 'TauxInfection'], + dtype='object') ++ + +
+Interesting. So we have multivariate time series for each countries, +regarding different daily metrics. Looking at TauxGuerison or +TauxDeces could give us a sense of the quality of each country's +medical care. The sum of the rates always gives roughly 1 (100%) : +
+rate_columns = data.columns[-3:] +print(data[rate_columns].sum(1).unique()) ++
+[100. 99.99 99.99 100.01 100. 100. 100.01 100.01 99.99] ++
2 Statistics
++We want to compute statistics over February, per country, so we can +start by aggregating the data per country. First, we compute the +average value for each metric for each country for rates. +
+count_columns = data.columns[2:-3] +data_grouped = data.groupby('Pays') +mean_rates_per_country = data_grouped[rate_columns].mean() +mean_rates_per_country.head() ++
+ TauxDeces TauxGuerison TauxInfection +Pays +Afghanistan 4.370968 87.556452 8.071935 +Afrique du Sud 3.217419 93.025806 3.757097 +Albanie 1.689032 62.334839 35.976774 +Algérie 2.656774 68.720645 28.625161 +Allemagne 2.785484 91.093226 6.121290 ++ + +
+Let's see what are the countries with most elevated death rate over +the month of February. We expect them to be poor countries, meaning +they have less means to heal their patients. +
+print(mean_rates_per_country.sort_values('TauxDeces', ascending=False).head(10)) ++
+TauxDeces TauxGuerison TauxInfection +Pays +Yémen 28.496452 65.724194 5.779677 +Mexique 8.748710 77.767742 13.484194 +Syrie 6.580968 59.050000 34.369677 +Soudan 6.193226 74.704839 19.101613 +Égypte 5.763226 77.599355 16.636774 +Équateur 5.721935 84.903548 9.375806 +Chine 5.163226 94.075484 0.760645 +Bolivie 4.720968 75.581613 19.697419 +Afghanistan 4.370968 87.556452 8.071935 +Libéria 4.273226 91.757419 3.965806 ++
+Indeed some of these countries can be qualified as poor. Yemen seems +extremely hit by the epidemic and it seems that 30% of his infected +people died in February. Yemen is a very poor country, but let's inspect this number, which seems very +high compared to the other countries. +
+data_grouped.mean()[count_columns].loc['Yémen']
+
++Infections 2178.806452 +Deces 620.516129 +Guerisons 1430.709677 +Name: Yémen, dtype: float64 ++ + +
+Now let's compare to median countries for each metric. +
+data_grouped.mean()[count_columns].median(0) ++
+Infections 50333.935484 +Deces 613.387097 +Guerisons 23364.000000 +dtype: float64 ++ +
+Yemen seems to have as many deaths as the median country does, while +having way less contaminations. This can either be due to the lack of +testing in the country, or awful medical care conditions. This +highlights the growing poverty of the country, aggravated by war. +
+3 Plotting
++Now we can plot many thing. We can for instance inspect a country of +interest, and try to see how it behaves over the month of +February. Let's see how the US were impacted. +
+import matplotlib.pyplot as plt +country_data = data[data['Pays'] == 'États-Unis'] +country_data.plot('Date', ['Infections', 'Deces', 'Guerisons']) + +plt.savefig(matplot_lib_filename) +matplot_lib_filename + ++
+
+We can't see much on this type of plot, because for most countries, +metrics are on different scales, and this data is only the evolution +during one month, which is small for epidemic data. Also this data +only shows the evolution of contaminated people. Let's look quickly at +the number of new cases per day, for the US. (We add the +
+import numpy as np +country_data['NewInfections'] = np.array([0] + (country_data['Infections'].values[1:] - country_data['Infections'].values[:-1]).tolist()) + 117903 +country_data.plot('Date', 'NewInfections') +plt.savefig(matplot_lib_filename) +matplot_lib_filename ++
+
+This shows that the number of new cases has grown every day during +February in the US, which indicates that the epidemic is not slowing +there. +
+ ++So let's try other visualisations. We can try to plot the mean distribution for each rate +metrics for instance. +
+mean_rates_per_country.hist(rate_columns, bins=20) + +plt.savefig(matplot_lib_filename) +matplot_lib_filename ++
+
+This plot show that overall, February was not the worst month for the
+world : most countries show a high recovery rate, and small death
+rate, meaning that the medical services were not to much
+overwhelmed. TauxInfection is not very meaningful, because it only
+shows the proportion of people not recovered and not dead.
+