3.10 Selecting with arrays

Download notebook Interact

Selecting values from an array

import numpy as np
np.set_printoptions(precision=2, suppress=True)

We use arrays all the time, in data science.

One of the most common tasks we have to do on arrays, is to select values.

We do this with array slicing.

We do array slicing when we follow the array variable name with [ (open square brackets), followed by something to specify which elements we want, followed by ] (close square brackets.

The simplest case is when we want a single element from a one-dimensional array. In that case the thing between the [ and the ] is the index of the value that we want.

The index of the first value is 0, the index of the second value is 2, and so on.

Here’s an example:

# Make a 1-dimensional array with three values
my_array = np.array([10, 20, 30])
my_array
array([10, 20, 30])

Here we get the first value. This value is at index 0.

# Get the first value (at index position 0)
my_array[0]
10
# Get the second value (at index position 1)
my_array[1]
20
# Get the third value (at index position 2)
my_array[2]
30

At first this will take some time to get used to, that the first value is at index position 0. There are good reasons for this, and many programming languages have this convention, but it does a while to get this habit of mind.

We will see more of this kind of array slicing soon.

Indexing with Boolean arrays

We often want to select several elements from an array according to some criterion.

The most common way to do this, is to do array slicing, using a Boolean array between the square brackets.

It may be easier to understand this by example.

You have already come across Boolean arrays.

These are arrays of True and False values.

randoms = np.random.uniform(size=4)
randoms
array([0.82, 0.06, 0.52, 0.18])

Here is a Boolean array, created from applying a comparison to an array:

less_than_0p5 = randoms < 0.5
less_than_0p5
array([False,  True, False,  True])

As you have already seen, we can do things like count the number of True values in the Boolean array:

np.count_nonzero(less_than_0p5)
2

Now let us say that we wanted to get the elements from randoms that are less than 0.5. That is, we want to get the elements in randoms for which the corresponding element in less_than_0p5 is True.

We can do this with Boolean array indexing. The Boolean array goes between the square brackets, like this:

randoms[less_than_0p5]
array([0.06, 0.18])

We have selected the numbers in random that are less than 0.5.

Exercises

See the array indexing exercises.