Pandas plotting methods

We start by loading our familiar gender_data dataset.

# Load the Numpy array library, call it 'np'
import numpy as np
# Load the Pandas data science library, call it 'pd'
import pandas as pd
# Turn on a setting to use Pandas more safely.
pd.set_option('mode.chained_assignment', 'raise')

If you are running on your laptop, you should download the gender_stats.csv file to the same directory as this notebook.

# Load the data file
gender_data = pd.read_csv('gender_stats.csv')
gender_data.head()
country_name country_code fert_rate gdp_us_billion health_exp_per_cap health_exp_pub prim_ed_girls mat_mort_ratio population
0 Aruba ABW 1.66325 NaN NaN NaN 48.721939 NaN 0.103744
1 Afghanistan AFG 4.95450 19.961015 161.138034 2.834598 40.109708 444.00 32.715838
2 Angola AGO 6.12300 111.936542 254.747970 2.447546 NaN 501.25 26.937545
3 Albania ALB 1.76925 12.327586 574.202694 2.836021 47.201082 29.25 2.888280
4 Andorra AND NaN 3.197538 4421.224933 7.260281 47.123345 NaN 0.079547
# Get the GDP values as a Pandas Series
gdp = gender_data['gdp_us_billion']
gdp.head()
0           NaN
1     19.961015
2    111.936542
3     12.327586
4      3.197538
Name: gdp_us_billion, dtype: float64

Plotting with methods

You have already seen basic ploting with the Matplotlib library.

Here is the magic incantation to load the Matplotlib plotting library.

# Load the library for plotting, name it 'plt'
import matplotlib.pyplot as plt
# Display plots inside the notebook.
%matplotlib inline
# Make plots look a little more fancy
plt.style.use('fivethirtyeight')

Here is basic plotting of a Pandas series, using Matplotlib. This is what you have already seen.

plt.hist(gdp);
../_images/df_plotting_8_0.png

It is possible you will see warnings as Matplotlib tried to calculate the bin widths for the histogram. If you do see them, these warnings result from Matplotlib struggling with NaN (missing values.

Another way to do the histogram, is to use the hist method of the series.

A method is a function attached to a value. In this case hist is a function attached to a value of type Series.

Using the hist method instead of the plt.hist function can make the code a bit easier to read. The method also has the advantage that it discards the NaN values, by default, so it does not generate the same warnings.

gdp.hist();
../_images/df_plotting_11_0.png

Now we have had a look at the GDP values, we will look at the values for the mat_mort_ratio column. These are the numbers of women who die in childbirth for every 100,000 births.

mmr = gender_data['mat_mort_ratio']
mmr
0         NaN
1      444.00
2      501.25
3       29.25
4         NaN
        ...  
211       NaN
212    399.75
213    143.75
214    233.75
215    398.00
Name: mat_mort_ratio, Length: 216, dtype: float64
mmr.hist();
../_images/df_plotting_14_0.png

We are interested in the relationship of gpp and mmr. Maybe richer countries have better health care, and fewer maternal deaths.

Here is a plot, using the standard Matplotlib scatter function.

plt.scatter(gdp, mmr);
../_images/df_plotting_16_0.png

We can do the same plot using the plot.scatter method on the data frame. In that case, we specify the column names that should go on the x and the y axes.

gender_data.plot.scatter('gdp_us_billion', 'mat_mort_ratio');
../_images/df_plotting_18_0.png

An advantage of doing it this way is that we get the column names on the x and y axes by default.