Worldwide covid evolution in February 2021

Table of Contents

1 Dataset

We want to introduce here this dataset, taken on the 03/03/2021. We chose to study the per country daily dataset so we have some preprocessing work to do, and more fine grained statistical analysis.

import pandas as pd
data = pd.read_csv('./coronavirus.politologue.com-pays-2021-03-03.csv', skiprows=7, sep=';')
data.head()
         Date                 Pays  ...  TauxGuerison  TauxInfection
0  2021-03-03              Andorre  ...         96.27           2.72
1  2021-03-03  Émirats Arabes Unis  ...         96.78           2.90
2  2021-03-03          Afghanistan  ...         88.50           7.11
3  2021-03-03   Antigua-et-Barbuda  ...         39.92          58.26
4  2021-03-03              Albanie  ...         65.40          32.91

[5 rows x 8 columns]

Let's see how big the data is, and the date range it covers.

print(data.shape)
data['Date'] = pd.to_datetime(data['Date'])
print(min(data['Date']))
print(max(data['Date']))
(6293, 8)
2021-02-01 00:00:00
2021-03-03 00:00:00

So it's a pretty small dataset, so the computations should be fast. Let's look at the columns

print(data.columns)
Index(['Date', 'Pays', 'Infections', 'Deces', 'Guerisons', 'TauxDeces',
       'TauxGuerison', 'TauxInfection'],
      dtype='object')

Interesting. So we have multivariate time series for each countries, regarding different daily metrics. Looking at TauxGuerison or TauxDeces could give us a sense of the quality of each country's medical care. The sum of the rates always gives roughly 1 (100%) :

rate_columns = data.columns[-3:]
print(data[rate_columns].sum(1).unique())
[100.    99.99  99.99 100.01 100.   100.   100.01 100.01  99.99]

2 Statistics

We want to compute statistics over February, per country, so we can start by aggregating the data per country. First, we compute the average value for each metric for each country for rates.

count_columns = data.columns[2:-3]
data_grouped = data.groupby('Pays')
mean_rates_per_country = data_grouped[rate_columns].mean()
mean_rates_per_country.head()
                TauxDeces  TauxGuerison  TauxInfection
Pays                                                  
Afghanistan      4.370968     87.556452       8.071935
Afrique du Sud   3.217419     93.025806       3.757097
Albanie          1.689032     62.334839      35.976774
Algérie          2.656774     68.720645      28.625161
Allemagne        2.785484     91.093226       6.121290

Let's see what are the countries with most elevated death rate over the month of February. We expect them to be poor countries, meaning they have less means to heal their patients.

print(mean_rates_per_country.sort_values('TauxDeces', ascending=False).head(10))
TauxDeces  TauxGuerison  TauxInfection
Pays                                               
Yémen        28.496452     65.724194       5.779677
Mexique       8.748710     77.767742      13.484194
Syrie         6.580968     59.050000      34.369677
Soudan        6.193226     74.704839      19.101613
Égypte        5.763226     77.599355      16.636774
Équateur      5.721935     84.903548       9.375806
Chine         5.163226     94.075484       0.760645
Bolivie       4.720968     75.581613      19.697419
Afghanistan   4.370968     87.556452       8.071935
Libéria       4.273226     91.757419       3.965806

Indeed some of these countries can be qualified as poor. Yemen seems extremely hit by the epidemic and it seems that 30% of his infected people died in February. Yemen is a very poor country, but let's inspect this number, which seems very high compared to the other countries.

data_grouped.mean()[count_columns].loc['Yémen']
Infections    2178.806452
Deces          620.516129
Guerisons     1430.709677
Name: Yémen, dtype: float64

Now let's compare to median countries for each metric.

data_grouped.mean()[count_columns].median(0)
Infections    50333.935484
Deces           613.387097
Guerisons     23364.000000
dtype: float64

Yemen seems to have as many deaths as the median country does, while having way less contaminations. This can either be due to the lack of testing in the country, or awful medical care conditions. This highlights the growing poverty of the country, aggravated by war.

3 Plotting

Now we can plot many thing. We can for instance inspect a country of interest, and try to see how it behaves over the month of February. Let's see how the US were impacted.

import matplotlib.pyplot as plt
country_data = data[data['Pays'] == 'États-Unis']
country_data.plot('Date', ['Infections', 'Deces', 'Guerisons'])

plt.savefig(matplot_lib_filename)
matplot_lib_filename

figuregXPsVe.png

We can't see much on this type of plot, because for most countries, metrics are on different scales, and this data is only the evolution during one month, which is small for epidemic data. Also this data only shows the evolution of contaminated people. Let's look quickly at the number of new cases per day, for the US. (We add the

import numpy as np
country_data['NewInfections'] = np.array([0] + (country_data['Infections'].values[1:] - country_data['Infections'].values[:-1]).tolist()) + 117903
country_data.plot('Date', 'NewInfections')
plt.savefig(matplot_lib_filename)
matplot_lib_filename

figureKiPXSr.png

This shows that the number of new cases has grown every day during February in the US, which indicates that the epidemic is not slowing there.

So let's try other visualisations. We can try to plot the mean distribution for each rate metrics for instance.

mean_rates_per_country.hist(rate_columns, bins=20)

plt.savefig(matplot_lib_filename)
matplot_lib_filename

figure6Z7VPf.png This plot show that overall, February was not the worst month for the world : most countries show a high recovery rate, and small death rate, meaning that the medical services were not to much overwhelmed. TauxInfection is not very meaningful, because it only shows the proportion of people not recovered and not dead.

Author: Corentin Ambroise

Created: 2021-03-03 Mer 15:39

Validate