3.9 Revision - three girls
Three girls
In which we solve the three-girls-in-family problem.
The problem
If there is a family of four children, what is the chance that family will consist of exactly three girls and one boy?
We decided we could simulate this situation, by taking four random numbers, between 0 and 1. For each number, if it is less than 0.5, we label this as a girl, otherwise we label it as a boy. Then we could count how many girls we got. That is one family. We repeat the procedure many times, and count how many families have three girls (three of four random numbers less than 0.5).
A simulation
import numpy as np
np.set_printoptions(precision=2)
First we do a simulation of a single family.
Start with 4 random numbers, between 0 and 1.
We could do these one at a time:
first_child = np.random.uniform()
first_child
0.40513462796313826
second_child = np.random.uniform()
second_child
0.5363529670678105
third_child = np.random.uniform()
third_child
0.2141145240592952
fourth_child = np.random.uniform()
fourth_child
0.053217053717968255
That gets boring. It is neater to make an array of 4 numbers in one shot, like this:
one_family = np.random.uniform(size=4)
one_family
array([0.85, 0.18, 0.99, 0.5 ])
Arrays allow us to do the same operation on all the elements.
For example, we can ask whether each random number is less than 0.5.
girls = one_family < 0.5
girls
array([False, True, False, True])
Notice that the new array, girls
, has four elements, like the
original array one_family
. At each position in the girls
array, there is a True
if the corresponding element in
one_family
was less than 0.5, and False
otherwise.
We consider True
to mean “girl” and False
to mean “boy”. To count the number of girls in this family, we need to count the number of True
values in the array.
n_girls = np.count_nonzero(girls)
n_girls
2
That is the result of our simulation, for one family.
We want to do this many times. How would we do that?
One way is to make a two-dimensional array of random numbers.
A two-dimensional array has rows and columns.
In our case, the row will be a single family. There are four columns, so each row has four elements, corresponding to the four children in the family.
Here we get ready to simulate 10 families, with one 2D array.
ten_families = np.random.uniform(size=(10, 4))
ten_families
array([[0.02, 0.61, 0.36, 0.09],
[0.1 , 0.4 , 0.45, 0.43],
[0.33, 0.75, 0.43, 0.05],
[0.02, 0.2 , 0.19, 0.82],
[0.25, 0.06, 0.35, 0.36],
[0.22, 0.83, 0.14, 0.95],
[0.21, 0.54, 0.08, 0.59],
[0.6 , 0.68, 0.92, 0.75],
[0.76, 0.49, 0.01, 0.58],
[0.95, 0.25, 0.7 , 0.61]])
Notice the size=
argument to np.random.uniform
. When we
wanted an array of 4 values the size was 4
. Now we want a 2D
array, the size is two values, between parentheses, the first
value is the number of rows, and the second is the number of
columns.
We can apply our test < 0.5
to all the 10 * 4 elements at the
same time.
are_girls = ten_families < 0.5
are_girls
array([[ True, False, True, True],
[ True, True, True, True],
[ True, False, True, True],
[ True, True, True, False],
[ True, True, True, True],
[ True, False, True, False],
[ True, False, True, False],
[False, False, False, False],
[False, True, True, False],
[False, True, False, False]])
Remember, each row represents a family, and each True
value
represents a girl. We want to count how many True
values
there are in each row. We can try np.count_nonzero
on this
array, but:
np.count_nonzero(are_girls)
24
By default, np.count_nonzero
counts the number of True values in the entire 2D array.
We want it to count the number of True
value in each row.
We can do that, by using the axis
argument to
np.count_nonzero
. See Arrays and axes for
a more detailed explanation.
n_girls = np.count_nonzero(are_girls, axis=1)
n_girls
array([3, 4, 3, 3, 4, 2, 2, 0, 2, 1])
n_girls
has one element per row in the are_girls
array. The element corresponding to the first row, has the count of True
values in the first row, and so on.
Now we need to ask the question, how many of the counts in n_girls
are equal to 3?
To do this, we can use another comparison operator, like <
in
as in < 0.5
. The operator is ==
. Notice the double =
sign, together. It is a test, that returns True
or False
.
For example:
4 == 3
False
4 == 4
True
These are expressions, because they return values.
Compare to the single equals, which is the assignment operator, in an assignment expression.
a = 4
Notice this does not return anything, because it is not an expression, it is an assignment statement. a
now has the value 4.
a
4
We can test whether the value of a
is 4 like this:
a == 4
True
This is an equality test expression, so it does return a value.
How does this operate on arrays? It operates the same way as the other comparison operators - element by element:
my_array = np.array([2, 3, 4, 2])
my_array
array([2, 3, 4, 2])
my_array == 2
array([ True, False, False, True])
We can use this trick on the n_girl
array, to find counts that are equal to 3.
n_girls == 3
array([ True, False, True, True, False, False, False, False, False,
False])
To find the number of 3s in n_girls
:
np.count_nonzero(n_girls == 3)
3
Now the proportion of the counts, that are equal to 3:
prop_3 = np.count_nonzero(n_girls == 3) / 10
prop_3
0.3
Exercises
See three girl simulation exercises for some exercises to extend the simulation on this page.