In [None]:
# üìò SARS-CoV-2 Epidemic Analysis (MOOC Reproducible Research)

**Author:** [Roy MATTA]  
**Course:** MOOC ‚Äì Reproducible Research  
**Module 3: Final Computational Document**  
**Tool Used:** Jupyter Notebook (Python)  
**Dataset:** Our World in Data ‚Äì Covid-19  
**Link:** [Paste your GitLab link here once ready]

---

This notebook is a reproducible data analysis of the Covid-19 epidemic with a focus on infections, deaths, and vaccination trends in selected countries.

# Analysis of the SARS-CoV-2 Epidemic

This notebook provides an exploratory data analysis of the SARS-CoV-2 epidemic using public data from Our World in Data. The focus is on understanding the evolution of cases, deaths, and vaccinations over time, with a comparison between selected countries.

Data source: [Our World in Data](https://ourworldindata.org/coronavirus-source-data)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

# Set default style
sns.set(style='whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Dataset URL (OWID)
url = "https://covid.ourworldindata.org/data/owid-covid-data.csv"

# Read the dataset
df = pd.read_csv(url, parse_dates=['date'])

# Show structure
df.head()

## Data Preparation

We focus on four countries for comparative analysis: France, Germany, Italy, and the United States. We'll select relevant columns and filter the data.

# Choose countries
countries = ['France', 'Germany', 'Italy', 'United States']

# Filter
df_countries = df[df['location'].isin(countries)]

# Keep key variables
df_countries = df_countries[['location', 'date', 'total_cases', 'new_cases', 'total_deaths', 'new_deaths', 'people_vaccinated', 'people_fully_vaccinated']]

# Drop missing location/date rows
df_countries = df_countries.dropna(subset=['location', 'date'])
df_countries.head()

## Evolution of Confirmed Cases

Let's plot the total number of confirmed Covid-19 cases for each country over time.

plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['total_cases'], label=country)

plt.title("Total Confirmed Covid-19 Cases")
plt.xlabel("Date")
plt.ylabel("Total Cases")
plt.legend()
plt.tight_layout()
plt.show()

## Interpretation

The graph shows how the epidemic evolved in each country. The United States has experienced the highest case count overall. France, Italy, and Germany followed similar early trajectories but diverged later. Differences in health policy and timing of interventions may explain some of the variation.

## Vaccination Progress

We now look at how vaccination campaigns progressed in the selected countries. We'll use the number of people who received at least one dose and those fully vaccinated.

plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['people_vaccinated'], label=country)

plt.title("People Vaccinated (At Least One Dose)")
plt.xlabel("Date")
plt.ylabel("Number of People")
plt.legend()
plt.tight_layout()
plt.show()

## Daily Deaths

In this section, we observe the daily reported deaths. This gives a better sense of the peaks and waves of the pandemic.

plt.figure(figsize=(12,6))
for country in countries:
    subset = df_countries[df_countries['location'] == country]
    plt.plot(subset['date'], subset['new_deaths'].rolling(window=7).mean(), label=country)  # 7-day average

plt.title("Daily New Deaths (7-day average)")
plt.xlabel("Date")
plt.ylabel("Deaths per Day")
plt.legend()
plt.tight_layout()
plt.show()

## Insights

- **Waves**: We observe clear waves in daily deaths, often corresponding to new variants or delayed interventions.
- **Vaccination effect**: Countries with faster vaccine rollouts (e.g. United States) tend to show lower death peaks in later waves.
- **Variations**: Policy differences and demographic structures may also explain the variation in death rates.

## Conclusion

This analysis provided an overview of the evolution of the Covid-19 epidemic in four major countries. Key takeaways include:

- The United States experienced the largest number of total cases.
- Vaccination campaigns followed different trajectories, with varying speeds and coverage.
- Waves of infections and deaths show the importance of timely interventions and public health measures.

Further analysis could include mobility data, testing policies, or stringency indexes to better understand causality.

# This is not executable directly.
# Use File > Print Preview > Save as PDF

---

üìù *Notebook completed as part of the MOOC Reproducible Research. All code and figures were developed independently using public data.*  
*All plots were generated using Python libraries including pandas, seaborn, and matplotlib.*

