3.9 Revision - three girls

Download notebook Interact

Three girls

In which we solve the three-girls-in-family problem.

The problem

If there is a family of four children, what is the chance that family will consist of exactly three girls and one boy?

We decided we could simulate this situation, by taking four random numbers, between 0 and 1. For each number, if it is less than 0.5, we label this as a girl, otherwise we label it as a boy. Then we could count how many girls we got. That is one family. We repeat the procedure many times, and count how many families have three girls (three of four random numbers less than 0.5).

A simulation

import numpy as np
np.set_printoptions(precision=2)

First we do a simulation of a single family.

Start with 4 random numbers, between 0 and 1.

We could do these one at a time:

first_child = np.random.uniform()
first_child
0.40513462796313826
second_child = np.random.uniform()
second_child
0.5363529670678105
third_child = np.random.uniform()
third_child
0.2141145240592952
fourth_child = np.random.uniform()
fourth_child
0.053217053717968255

That gets boring. It is neater to make an array of 4 numbers in one shot, like this:

one_family = np.random.uniform(size=4)
one_family
array([0.85, 0.18, 0.99, 0.5 ])

Arrays allow us to do the same operation on all the elements.

For example, we can ask whether each random number is less than 0.5.

girls = one_family < 0.5
girls
array([False,  True, False,  True])

Notice that the new array, girls, has four elements, like the original array one_family. At each position in the girls array, there is a True if the corresponding element in one_family was less than 0.5, and False otherwise.

We consider True to mean “girl” and False to mean “boy”. To count the number of girls in this family, we need to count the number of True values in the array.

n_girls = np.count_nonzero(girls)
n_girls
2

That is the result of our simulation, for one family.

We want to do this many times. How would we do that?

One way is to make a two-dimensional array of random numbers.

A two-dimensional array has rows and columns.

In our case, the row will be a single family. There are four columns, so each row has four elements, corresponding to the four children in the family.

Here we get ready to simulate 10 families, with one 2D array.

ten_families = np.random.uniform(size=(10, 4))
ten_families
array([[0.02, 0.61, 0.36, 0.09],
       [0.1 , 0.4 , 0.45, 0.43],
       [0.33, 0.75, 0.43, 0.05],
       [0.02, 0.2 , 0.19, 0.82],
       [0.25, 0.06, 0.35, 0.36],
       [0.22, 0.83, 0.14, 0.95],
       [0.21, 0.54, 0.08, 0.59],
       [0.6 , 0.68, 0.92, 0.75],
       [0.76, 0.49, 0.01, 0.58],
       [0.95, 0.25, 0.7 , 0.61]])

Notice the size= argument to np.random.uniform. When we wanted an array of 4 values the size was 4. Now we want a 2D array, the size is two values, between parentheses, the first value is the number of rows, and the second is the number of columns.

We can apply our test < 0.5 to all the 10 * 4 elements at the same time.

are_girls = ten_families < 0.5
are_girls
array([[ True, False,  True,  True],
       [ True,  True,  True,  True],
       [ True, False,  True,  True],
       [ True,  True,  True, False],
       [ True,  True,  True,  True],
       [ True, False,  True, False],
       [ True, False,  True, False],
       [False, False, False, False],
       [False,  True,  True, False],
       [False,  True, False, False]])

Remember, each row represents a family, and each True value represents a girl. We want to count how many True values there are in each row. We can try np.count_nonzero on this array, but:

np.count_nonzero(are_girls)
24

By default, np.count_nonzero counts the number of True values in the entire 2D array.

We want it to count the number of True value in each row.

We can do that, by using the axis argument to np.count_nonzero. See Arrays and axes for a more detailed explanation.

n_girls = np.count_nonzero(are_girls, axis=1)
n_girls
array([3, 4, 3, 3, 4, 2, 2, 0, 2, 1])

n_girls has one element per row in the are_girls array. The element corresponding to the first row, has the count of True values in the first row, and so on.

Now we need to ask the question, how many of the counts in n_girls are equal to 3?

To do this, we can use another comparison operator, like < in as in < 0.5. The operator is ==. Notice the double = sign, together. It is a test, that returns True or False. For example:

4 == 3
False
4 == 4
True

These are expressions, because they return values.

Compare to the single equals, which is the assignment operator, in an assignment expression.

a = 4

Notice this does not return anything, because it is not an expression, it is an assignment statement. a now has the value 4.

a
4

We can test whether the value of a is 4 like this:

a == 4
True

This is an equality test expression, so it does return a value.

How does this operate on arrays? It operates the same way as the other comparison operators - element by element:

my_array = np.array([2, 3, 4, 2])
my_array
array([2, 3, 4, 2])
my_array == 2
array([ True, False, False,  True])

We can use this trick on the n_girl array, to find counts that are equal to 3.

n_girls == 3
array([ True, False,  True,  True, False, False, False, False, False,
       False])

To find the number of 3s in n_girls:

np.count_nonzero(n_girls == 3)
3

Now the proportion of the counts, that are equal to 3:

prop_3 = np.count_nonzero(n_girls == 3) / 10
prop_3
0.3

Exercises

See three girl simulation exercises for some exercises to extend the simulation on this page.