# Missing values¶

```
# Load the Numpy array library, call it 'np'
import numpy as np
# Load the Pandas data science library, call it 'pd'
import pandas as pd
# Turn on a setting to use Pandas more safely.
pd.set_option('mode.chained_assignment', 'raise')
```

If you are running on your laptop, you should download the
`gender_stats.csv`

file to the same
directory as this notebook.

See the gender statistics description page for more detail on the dataset.

```
# Load the data file
gender_data = pd.read_csv('gender_stats.csv')
gender_data.head()
```

country_name | country_code | fert_rate | gdp_us_billion | health_exp_per_cap | health_exp_pub | prim_ed_girls | mat_mort_ratio | population | |
---|---|---|---|---|---|---|---|---|---|

0 | Aruba | ABW | 1.66325 | NaN | NaN | NaN | 48.721939 | NaN | 0.103744 |

1 | Afghanistan | AFG | 4.95450 | 19.961015 | 161.138034 | 2.834598 | 40.109708 | 444.00 | 32.715838 |

2 | Angola | AGO | 6.12300 | 111.936542 | 254.747970 | 2.447546 | NaN | 501.25 | 26.937545 |

3 | Albania | ALB | 1.76925 | 12.327586 | 574.202694 | 2.836021 | 47.201082 | 29.25 | 2.888280 |

4 | Andorra | AND | NaN | 3.197538 | 4421.224933 | 7.260281 | 47.123345 | NaN | 0.079547 |

```
# Get the GDP values as a Pandas Series
gdp = gender_data['gdp_us_billion']
gdp.head()
```

```
0 NaN
1 19.961015
2 111.936542
3 12.327586
4 3.197538
Name: gdp_us_billion, dtype: float64
```

## Missing values and `NaN`

¶

Looking at the values of `gdp`

(and therefore, the values of the
`gdp_us_billion`

column of `gender_data`

, we see that some of the values are
`NaN`

, which means Not a Number. Pandas uses this marker to indicate values
that are not available, or *missing data*.

Numpy does not like to calculate with `NaN`

values. Here is Numpy trying to
calculate the median of the `gdp`

values.

```
np.median(gdp)
```

```
nan
```

Notice the warning about an invalid value.

Numpy recognizes that one or more values are `NaN`

and refuses to guess what to do, when calculating the median.

You saw from the shape above that `gender_data`

has 263 rows. We can use the
general Python `len`

function, to see how many elements there are in `gdp`

.

```
len(gdp)
```

```
216
```

As expected, it has the same number of elements as there are rows in `gender_data`

.

The `count`

method of the series gives the number of values that are *not
missing* - that is - not `NaN`

.

```
gdp.count()
```

```
200
```