# Reply to the Supreme Court¶

Our task has been to reply to the Supreme Court on their judgment in the appeal of Robert Swain.

Remember, Robert Swain appealed his death sentence, on the basis that the jury selection was biased against Black people.

His trial jury of 12 people had no Black members.

The local population of eligible jurors was 26% Black.

If the jury had been representative, we would expect about 26 of 100 people to be Black. That’s around 1 in 4 (25%), so we would expect about one in four jurors to be Black - so around 3 of 12.

The Supreme Court was not convinced that there was evidence of systematic bias. But, to start with the jurors - is it surprising that we expected around 3 Black jurors, but we got 0. Is the value 0 surprising, if each juror has 26% chance of being Black?

To answer this, we are going to go through a couple of steps.

The first is to build an *ideal model* of the world, where it *is true* that
each juror has a 26% of being Black. We sometimes call this our *ideal world*.
If you are used to statistical terms, this ideal model is our *null
hypothesis*.

Then we can *simulate* making many juries in this ideal world.

Finally we ask whether our simulated juries, from the ideal world, often give
us a count of zero Black jurors. If they don’t, then we can say that we are
*surprised* by the value of 0, if the jury did arise from that real world. If
the value 0 is sufficiently unusual, we become suspicious that the real world
is rather different from our ideal world. We consider *rejecting* the ideal
world as a good model of the real world.

## The ideal world¶

Our *ideal model* (or null hypothesis) is a world where each juror has been
truly randomly selected from the eligible population. That is, for any one
juror, there is a 0.26 probability that they are Black.

## Simulations with the ideal model¶

To do a *simulation* with this ideal model, we will start by making one jury,
of 12 people, where it is really true that each juror has a 26% of being Black.
Not to pun, but we will call one simulation of a jury of 12 - *one trial*.

Then we simulate 10 juries of 12 people (do 10 trials), to get warmed up.

Finally we make 10000 juries, each of 12 people, and see what we get.

```
# Import the array library
import numpy as np
```

Here is one jury, and the number of Black people we get in our simulation.

```
# Make 12 random integers from 0 through 99
randoms = np.random.randint(0, 100, size=12)
# Say values < 26 correspond to black jurors.
# 26 of the numbers 0 through 99 are less than 26 (so 26% or p=0.26).
black_yn = randoms < 26
# We now have True for Black jurors and False otherwise.
# Count the number of Trues
np.count_nonzero(black_yn)
```

```
5
```

That is one estimate, for the number of Black people we can expect, if our
model is correct. Call this one *trial*. We can run that a few times to get a
range of values. If we run it only a few times, we might be unlucky, and get
some results that are not representative. It is safer to run it a huge number
of times, to make sure we’ve got an idea of the variation.

To start with, we will run 10 trials.

We get ready to store the results of each estimate.

```
# Make an array of 10 zeros, to store the results.
counts = np.zeros(10)
```

We repeat the code from the cell above, but now, we store each trial result
(count) in the `counts`

array:

```
randoms = np.random.randint(0, 100, size=100)
black_yn = randoms < 26
count = np.count_nonzero(black_yn)
counts[0] = count
counts
```

```
array([29., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
```

Run the cell above a few times, perhaps with Control-Enter, to see the first value in the `counts`

array changing.

Now we collect the result of 10 trials, by using a for loop.

```
# Make a new counts array of zeros to store the results.
counts = np.zeros(10)
for i in np.arange(10):
# This code is the same as the cell above, but indented,
# so we run it all, for each time through the for loop.
randoms = np.random.randint(0, 100, size=12)
black_yn = randoms < 26
count = np.count_nonzero(black_yn)
# Store the result at position i
counts[i] = count
counts
```

```
array([3., 3., 5., 1., 4., 5., 2., 2., 1., 1.])
```

Each of these values is one estimate for how many Black jurors we should expect, if our model is right. Already we get the feeling that 0 is rather unlikely, if our model is correct. But - how unlikely?

To get a better estimate, let us do the same thing, but with 10000 juries, and therefore, 10000 estimates.

```
# Make a new counts array of zeros to store the results.
counts = np.zeros(10000)
for i in np.arange(10000):
# This code is the same as the cell above, but indented,
# so we run it all, for each time through the for loop.
randoms = np.random.randint(0, 100, size=12)
black_yn = randoms < 26
count = np.count_nonzero(black_yn)
# Store the result at position i
counts[i] = count
counts
```

```
array([0., 0., 5., ..., 1., 2., 2.])
```

If you ran this cell yourself, you will notice that it runs very fast, in much less than a second, on any reasonable computer.

We now have 10000 estimates, one for each row in the original array, and therefore, one for each simulated jury.

Remember, the function `len`

shows us the length of the array, and therefore,
the number of values in this one-dimensional array.

```
len(counts)
```

```
10000
```

Next we want to have a look at the spread of these values. To do this, we plot a histogram. Here is how to do that, in Python. Don’t worry about the details, we will go into this more soon.

```
# Please don't worry about this bit of code for now.
# It sets up plotting in the notebook.
import matplotlib.pyplot as plt
%matplotlib inline
# Fancy plots
plt.style.use('fivethirtyeight')
```

Now show the histogram. Don’t worry about the details of this command.

```
# Do the histogram of our 10000 estimates.
plt.hist(counts, bins=np.arange(13))
```

```
(array([2.600e+02, 1.119e+03, 2.225e+03, 2.557e+03, 2.006e+03, 1.156e+03,
5.110e+02, 1.180e+02, 3.900e+01, 6.000e+00, 2.000e+00, 1.000e+00]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]),
<BarContainer object of 12 artists>)
```

The histogram above is called the *sampling distribution*. The sampling distribution is the distribution of thing we are interested in (the number of Black jurors) given the ideal model, of completely random selection of jurors from a 26% Black population.

It looks as if 0 is a relatively uncommon value among our simulations. How many times did we get a value of 0, in all our 10000 estimates?

```
counts_of_0 = counts == 0
n_zeros = np.count_nonzero(counts_of_0)
n_zeros
```

```
260
```

What *proportion* of jury simulations give a value of 0? We just divide the
number of times we see 0 by the number trials we made:

```
p = n_zeros / 10000
p
```

```
0.026
```

We have run an analysis assuming that the jurors were selected at random. On that assumption, a count of 0 jurors in 12 is fairly uncommon, in the sense that the proportion of times we see that result is:

```
p
```

```
0.026
```

In other words, our *estimate* of the *probability* of getting 0 Black people
in a jury of 12, is

```
p
```

```
0.026
```

What can we conclude? Only this: that in our ideal model world, where each juror has a 26% chance of being Black, 0 is uncommon. This surprising result, of 0, gives us some cause to wonder if our ideal model of the world is wrong. One way it could be wrong, is if there was bias in jury selection, so it was not true that each juror had a 26% of being Black.