Data Analysis and Visualisation of Covid-19 Cases in Africa using Python

Dr Adeayo Sotayo

April, 2021 || 4 minute read

Introduction

The World Health Organisation (WHO) provides the official daily counts of COVID-19 cases and deaths reported by countries, territories and areas globally. The dataset contains information such as the cases (cumulative total), cases (per 100,000 population), deaths (cumulative total), deaths (per 100,000 population), etc.

The dataset is available via this link. Caution must be taken when interpreting the data presented as different countries and territories have different data collection techniques and frequencies.

Aim and Objectives

The exercise focusses on Africa and analyses the following using the dataset provided by the WHO:

  • Rank the Covid-19 cases in Africa (i.e. most cases to least cases, top (3) and bottom (3) African countries)
  • Distribution of cases (per 100,000 population) for each African country
  • Visualise the data using bar charts and interactive maps
  • Evaluate some descriptive statistics (e.g. Mean, Median)
  • Utilise pandas, matplotlib and plotly libraries for data analysis and visualisation

We will use pandas, matplotlib and plotly libraries for this exercise

  • We'll use Pandas for data manipulation and analysis, e.g. open csv files, sort and filter data.
  • We'll use matplotlib.pyplot for data visualisation (i.e. bar charts)
  • We'll use plotly for interactive choropleth maps

Data Preparation and Analysis

Let's import pandas for data manipulation and analysis

import pandas as pd  

Now lets read this csv data from the WHO dataset to a Pandas dataframe

df = pd.read_csv("https://covid19.who.int/WHO-COVID-19-global-table-data.csv")

Let's take a look at the dataset.

df
NameWHO RegionCases - cumulative totalCases - cumulative total per 100000 populationCases - newly reported in last 7 daysCases - newly reported in last 7 days per 100000 populationCases - newly reported in last 24 hoursDeaths - cumulative totalDeaths - cumulative total per 100000 populationDeaths - newly reported in last 7 daysDeaths - newly reported in last 7 days per 100000 populationDeaths - newly reported in last 24 hoursTransmission Classification
0GlobalNaN1489998761908.710561579300274.209217856330314011540.22534921701.18071115185NaN
1United States of AmericaAmericas318353149617.840000367742111.10000051939567971171.5900047471.430000644Community transmission
2IndiaSouth-East Asia183765241331.6300002445559177.21000037925720483214.84000201751.4600003645Clusters of cases
3BrazilAmericas144415636794.130000398487187.47000072140395022185.84000170198.0100003086Community transmission
4FranceEurope54478838376.290000190837293.42000029980102890158.2000019733.030000315Community transmission
..........................................
233Saint HelenaAfrica00.00000000.000000000.0000000.0000000No cases
234TokelauWestern Pacific00.00000000.000000000.0000000.0000000No cases
235TongaWestern Pacific00.00000000.000000000.0000000.0000000No cases
236TurkmenistanEurope00.00000000.000000000.0000000.0000000No cases
237TuvaluWestern Pacific00.00000000.000000000.0000000.0000000No cases

238 rows × 13 columns

For this exercise, we'll only need the numerical data from two columns: "Cases - cumulative total" and "Cases - cumulative total per 100000 population".

df = pd.DataFrame(df, columns=["Name",
                               "WHO Region",
                               "Cases - cumulative total", 
                               "Cases - cumulative total per 100000 population"])

Now, let's take a look at the updated dataset.

df
NameWHO RegionCases - cumulative totalCases - cumulative total per 100000 population
0GlobalNaN1489998761908.710561
1United States of AmericaAmericas318353149617.840000
2IndiaSouth-East Asia183765241331.630000
3BrazilAmericas144415636794.130000
4FranceEurope54478838376.290000
...............
233Saint HelenaAfrica00.000000
234TokelauWestern Pacific00.000000
235TongaWestern Pacific00.000000
236TurkmenistanEurope00.000000
237TuvaluWestern Pacific00.000000

238 rows × 4 columns

As we can see, the WHO dataset contains data for different countries and regions in the world.

For this exercise, we are only interested in the data for countries within Africa.

Filtering Africa-Specific Covid-19 Dataset

We can assign a variable (Africa) to a filtered dataframe showing only countries with "Africa" assigned as the WHO region.

Africa = df[df["WHO Region"]=="Africa"]

Now, let's take a look at the new filtered data using the "head" function, which shows the top 5 entries in the dataframe

Africa.head()
WHO RegionCases - cumulative totalCases - cumulative total per 100000 population
20South AfricaAfrica15772002659.31
61EthiopiaAfrica254044220.98
79NigeriaAfrica16491280.00
80KenyaAfrica157492292.89
86AlgeriaAfrica121344276.72

Let's count the number of entries in the filtered dataset (Africa)

len(Africa)

50

The initial filtered data (Africa) only shows 50 countries, however, there are 54 countries in Africa.

This is because some African countries (i.e. Morocco, Tunisia, Egypt, Sudan, Libya, Somalia and Djibouti) are classified under the "Eastern Mediterranean" WHO Region.

Also, "Réunion", "Mayotte" & "Saint Helena" (which are dependent territories) are classified within Africa.

Therefore, for this exercise, these dependent territories are excluded.

We'll create a new variable (Africa_updated) to include & exclude the aforementioned countries.

Africa_updated = df[(df["WHO Region"]=="Africa" ) & 
                 ~df["Name"].isin(["Réunion", "Mayotte", "Saint Helena"]) |
                 df["Name"].isin(["Morocco", "Tunisia", "Egypt", "Sudan", "Libya", "Somalia", "Djibouti"]) 
                  ]

Again, Let's count the number of entries in the updated dataset (Africa_updated)

len(Africa_updated)

54

54 matches the total number of countries in Africa.

Descriptive Statistics

In order to get some descriptive statistics (e.g. mean, median, minimum, maximum) for the data, we'll use the "describe" function shown below

Africa_updated.describe()
Cases - cumulative total per 100000 population
count5.400000e+0154.000000
mean8.383431e+04647.380000
std2.266335e+051084.383573
min5.090000e+020.850000
25%6.659000e+0388.475000
50%2.230700e+04206.945000
75%4.765825e+04499.902500
max1.577200e+065513.130000

Cumulative Covid19 Cases in Africa: Top Three (3) and Bottom Three (3) Countries

Let's see what the updated data looks like. We'll like to see the top three (3) and bottom three (3) lines of the updated dataset

Africa_updated.head(3)
WHO RegionCases - cumulative totalCases - cumulative total per 100000 population
20South AfricaAfrica15772002659.31
41MoroccoEastern Mediterranean5104651382.98
56TunisiaEastern Mediterranean3053132583.32
Africa_updated.tail(3)
WHO RegionCases - cumulative totalCases - cumulative total per 100000 population
188LiberiaAfrica209841.48
194MauritiusAfrica120694.83
202United Republic of TanzaniaAfrica5090.85

Cumulative Covid19 Cases per 100,000 Population in Africa: Top Three (3) and Bottom Three (3) Countries

Now, let's sort the dataset by the cumulative total cases per 100,000 population and assign it to a new variable

Africa_Sort = Africa_updated.sort_values(by="Cases - cumulative total per 100000 population", ascending=False)

Let's see what the sorted data (Africa_Sort) looks like. We'll also like to see the top three (3) and bottom three (3) lines of the sorted dataset

Africa_Sort.head(3)
NameWHO RegionCases - cumulative totalCases - cumulative total per 100000 population
165SeychellesAfrica54225513.13
125Cabo VerdeAfrica227724095.78
20South AfricaAfrica15772002659.31
Africa_Sort.tail(3)
NameWHO RegionCases - cumulative totalCases - cumulative total per 100000 population
168ChadAfrica477929.09
166NigerAfrica520421.50
202United Republic of TanzaniaAfrica5090.85

Data Visualisation - What is the cumulative number of cases for each African Country

First, we'll use matplotlib to plot a horizontal bar chart to visualise the cumulative number of cases for each African Country.

import matplotlib.pyplot as plt 

Let's assign variables to the data corresponding to the vertical and horizontal axes in the bar chart

Country = Africa_updated["Name"] #this will be on the vertical axis
Cumulative_cases = Africa_updated["Cases - cumulative total"] #this will be on the horizontal axis

Let's visualise the data (horizontal bar chart) and compare the cumulative number of cases for each African Country

#Basic horizontal bar chart to visualise the cumulative number of cases for each African Country

fig, ax = plt.subplots(figsize=(11,15))
ax.barh(Country, Cumulative_cases, color = "Orange", label = "Cumulative cases")
ax.set_title("Covid Cases in Africa - April 2021", fontsize = 20)
ax.set_xlabel("Cumulative cases", fontsize = 20)
ax.set_ylabel("Country", fontsize = 20)
ax.invert_yaxis() 

#"iat" is used to pickup a particular cell
plt.axvline(Africa.describe().iat[1,0], color = "blue", label = "Average") #Average vertical line 
plt.axvline(Africa.describe().iat[5,0], color = "green", label = "Median") #Median vertical line 
plt.tight_layout()
plt.legend();
Cumulative cases

Let's visualise the cumulative number of cases for each African Country on a Choropleth map

Plotly is used for the analysis because it allows interactive features and the choice of a Choropleth map

import plotly.express as px 

#Interactive African map to visualise the cumulative number of cases for each African Country

Cumulative_cases_plot = px.choropleth(Africa_updated,
                    locations="Name", #Spatial coordinates and corrseponds to a column in dataframe
                    color="Cases - cumulative total", #Corresponding data in the dataframe
                    locationmode = 'country names', #location mode == One of ‘ISO-3’, ‘USA-states’, or ‘country names’ 
                    #locationmode == should match the type of data entries in "locations"
                    scope="africa", #limits the scope of the map to Africa
                    title ="Covid-19 Cases in Africa - April 2021 (Cases - cumulative total)",
                    hover_name="Name",
                    color_continuous_scale = "deep",
                   )
Cumulative_cases_plot.update_traces(marker_line_color="black") # line markers between states
Cumulative_cases_plot.show()

This is an interactive map, so you can hover over each region, zoom in and out of the map.

What is the cumulative number of cases per 100,000 population for each African Country

Let's assign variables to the data corresponding to the vertical and horizontal axes in the new bar chart

Country_Sort = Africa_Sort["Name"]
Cumulative_cases_per_population = Africa_Sort["Cases - cumulative total per 100000 population"]

To put things in perspective, we need to evaluate the cumulative number of cases per 100,000 population for each African Country (i.e. the number of cases as a fraction of the population).

Similarly, let's use matplotlib to plot a horizontal bar chart to visualise the cumulative number of cases per 100,000 population for each African Country

#Basic horizontal bar chart to visualise the cumulative number of cases per 100,000 population for each African Country

fig, ax = plt.subplots(figsize=(11,15))
ax.barh(Country_Sort, Cumulative_cases_per_population, color = "Violet", label = "Cases - cumulative total per 100,000 population")
ax.set_title("Covid Cases in Africa - April 2021", fontsize = 20)
ax.set_xlabel("Cases - cumulative total per 100,000 population", fontsize = 20)
ax.set_ylabel("Country", fontsize = 20)
ax.invert_yaxis() 
plt.axvline(Africa_Sort.describe().iat[1,1], color = "blue", label = "Average") 
plt.axvline(Africa_Sort.describe().iat[5,1], color = "green", label = "Median") 
plt.tight_layout()
plt.legend();
Cases - cumulative total per 100,000 population

Let's visualise the cumulative number of cases per 100,000 population for each African Country on a Choropleth map

We'll use plotly again to create an interactive map

#Interactive African map to visualise the cumulative number of cases per 100,000 population for each African Country

Cumulative_cases_per_population_plot = px.choropleth(Africa_Sort,
                    locations="Name",  
                    color="Cases - cumulative total per 100000 population",
                    locationmode = 'country names', 
                    scope="africa", 
                    title ="Covid-19 Cases in Africa - April 2021 (Cases - cumulative total per 100,000 population)",
                    hover_name="Name",
                    color_continuous_scale = "Reds",
                   )
Cumulative_cases_per_population_plot.update_traces(marker_line_color="black") 
Cumulative_cases_per_population_plot.show()

This is an interactive map, so you can hover over each region, zoom in and out of the map.

Conclusion

Through the exercise, we have demonstrated how we can use Python (pandas, matplotlib and plotly libraries) to analyse and visualise (bar charts and interactive maps) Covid-19 cases in Africa, based on the dataset provided by the WHO.

More specifically, as part of this exercise, we completed the following:

  • Ranked the Covid-19 cases in Africa (i.e. most cases to least cases, top (3) and bottom (3) African countries)
  • Showed the distribution of cases (per 100,000 population) for each African country
  • Visualised the data using bar charts and interactive maps
  • Evaluated some descriptive statistics (e.g. Mean, Median)
  • Utilised pandas, matplotlib and plotly libraries for data analysis and visualisation

Please get in touch if you have questions or suggestions.

Bibliography & References

https://allenkunle.me/exploratory-analysis-police-shooting

https://covid19.who.int/table

https://plotly.com/python/

https://plotly.com/python/mapbox-county-choropleth/

https://www.worldometers.info/geography/how-many-countries-in-africa/

African Tech Enthusiasts

Empowering African youth through technology

Built with ☕ by Abbas