# Indexing with Boolean arrays¶

As usual with arrays, we need the Numpy library:

```
import numpy as np
```

Just for neatness below, we will only show numbers in arrays to 2 decimal places. This doesn’t affect any calculations, it just changes what we see when we show arrays in Jupyter:

```
# Set how many decimal places to display when showing arrays.
np.set_printoptions(precision=2)
```

## Select values with Boolean arrays¶

Here we are using Boolean arrays to *index* into other arrays. You will see
what we mean by that by the end of this section.

We often want to select several elements from an array according to some criterion.

The most common way to do this, is to do array slicing, using a Boolean array between the square brackets.

It can be easier to understand this by example than by description.

We start with the RateMyProfessors dataset.

It is a table where the rows are academic disciplines, and the columns contain the average student rating values for the corresponding discipline. We are going to fetch the columns from this table as arrays.

If you are running on your laptop, you should download the
`rate_my_course.csv`

file to the same
directory as this notebook.

```
# We have not covered this code yet. We will soon.
# Load the library for reading data files.
import pandas as pd
# Read the file into a table, select the first six rows.
big_courses = pd.read_csv('rate_my_course.csv').head(6)
# Put the columns into arrays, each with six elements.
# The disciplines (names of disciplines).
disciplines = np.array(big_courses['Discipline'])
# The corresponding average scores for Easiness.
easiness = np.array(big_courses['Easiness'])
```

We now have the names of the disciplines with the largest number of professors.

```
disciplines
```

```
array(['English', 'Mathematics', 'Biology', 'Psychology', 'History',
'Chemistry'], dtype=object)
```

Here are the “Easiness” scores for the six largest courses:

```
easiness
```

```
array([3.16, 3.06, 2.71, 3.32, 3.05, 2.65])
```

These are the easiness ratings corresponding to the `disciplines`

we saw
earlier. The top (largest) discipline is:

```
disciplines[0]
```

```
'English'
```

The Easiness rating for that course is:

```
easiness[0]
```

```
3.16275414471149
```

## Boolean arrays¶

Boolean arrays are arrays that contain values that are one of True or False.

Here is a Boolean array, created from applying a comparison to an array:

```
greater_than_3 = easiness > 3
greater_than_3
```

```
array([ True, True, False, True, True, False])
```

This has a `True`

value at the positions of elements > 3, and `False`

otherwise.

We can do things like count the number of `True`

values in the Boolean array:

```
np.count_nonzero(greater_than_3)
```

```
4
```

Now let us say that we wanted to get the elements from `easiness`

that are greater than 3. That is, we want to get the elements in `easiness`

for which the corresponding element in `greater_than_3`

is `True`

.

We can do this with *Boolean array indexing*. The Boolean array goes between
the square brackets, after the array name. As a reminder:

```
# The easiness array
easiness
```

```
array([3.16, 3.06, 2.71, 3.32, 3.05, 2.65])
```

```
# The greater_than_3 Boolean array
greater_than_3
```

```
array([ True, True, False, True, True, False])
```

We put the Boolean array between square brackets, after the array we want to get values from, like this:

```
# Boolean indexing into the easiness array.
easiness[greater_than_3]
```

```
array([3.16, 3.06, 3.32, 3.05])
```

We have selected the numbers in `easiness`

that are greater than 3.

See the picture below for an illustration of what is happening:

We can use this same Boolean array to index into another array. For example,
here we show the discipline *names* corresponding to the courses with Easiness
scores greater than 3:

```
disciplines[greater_than_3]
```

```
array(['English', 'Mathematics', 'Psychology', 'History'], dtype=object)
```

See the picture below for an illustration of how this works:

## Setting values with Boolean arrays¶

You have seen, above, that Boolean indexing can select values from an array:

```
# Create the Boolean array
another_array = np.array([2, 3, 4, 2, 1, 5, 1, 0, 3])
are_gt_2 = another_array > 2
are_gt_2
```

```
array([False, True, True, False, False, True, False, False, True])
```

```
# Get the values by indexing with the Boolean array.
# Return only the values of 'another_array' where the Boolean array has True.
another_array[are_gt_2]
```

```
array([3, 4, 5, 3])
```

Given what you know, what do you think would happen with:

```
another_array[are_gt_2] = 10
another_array
```

Try it.