3.10 Selecting with arrays
Selecting values from an array
import numpy as np
np.set_printoptions(precision=2, suppress=True)
We use arrays all the time, in data science.
One of the most common tasks we have to do on arrays, is to select values.
We do this with array slicing.
We do array slicing when we follow the array variable name with
[
(open square brackets), followed by something to specify
which elements we want, followed by ]
(close square brackets.
The simplest case is when we want a single element from a one-dimensional array. In that case the thing between the [
and the ]
is the index of the value that we want.
The index of the first value is 0, the index of the second value is 2, and so on.
Here’s an example:
# Make a 1-dimensional array with three values
my_array = np.array([10, 20, 30])
my_array
array([10, 20, 30])
Here we get the first value. This value is at index 0.
# Get the first value (at index position 0)
my_array[0]
10
# Get the second value (at index position 1)
my_array[1]
20
# Get the third value (at index position 2)
my_array[2]
30
At first this will take some time to get used to, that the first value is at index position 0. There are good reasons for this, and many programming languages have this convention, but it does a while to get this habit of mind.
We will see more of this kind of array slicing soon.
Indexing with Boolean arrays
We often want to select several elements from an array according to some criterion.
The most common way to do this, is to do array slicing, using a Boolean array between the square brackets.
It may be easier to understand this by example.
You have already come across Boolean arrays.
These are arrays of True
and False
values.
randoms = np.random.uniform(size=4)
randoms
array([0.82, 0.06, 0.52, 0.18])
Here is a Boolean array, created from applying a comparison to an array:
less_than_0p5 = randoms < 0.5
less_than_0p5
array([False, True, False, True])
As you have already seen, we can do things like count the number
of True
values in the Boolean array:
np.count_nonzero(less_than_0p5)
2
Now let us say that we wanted to get the elements from randoms
that are less than 0.5. That is, we want to get the elements
in randoms
for which the corresponding element in
less_than_0p5
is True
.
We can do this with Boolean array indexing. The Boolean array goes between the square brackets, like this:
randoms[less_than_0p5]
array([0.06, 0.18])
We have selected the numbers in random
that are less than 0.5.
Exercises
See the array indexing exercises.