Money and death

Download notebook Interact

We return to the death penalty.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# Make plots look a little bit more fancy
plt.style.use('fivethirtyeight')

In this case, we are going to analyze whether people with higher incomes are more likely to favor the death penalty.

To do this, we are going to analyze the results from a sample of the US General Social Survey from 2002.

If you are running on your laptop, download the data file GSS2002.csv.

# Read the data into a data frame
gss = pd.read_csv('GSS2002.csv')
gss
ID Region Gender Race Education Marital Religion Happy Income PolParty ... Marijuana DeathPenalty OwnGun GunLaw SpendMilitary SpendEduc SpendEnv SpendSci Pres00 Postlife
0 1 South Central Female White HS Divorced Inter-nondenominational Pretty happy 30000-34999 Strong Rep ... NaN Favor No Favor Too little Too little About right About right Bush Yes
1 2 South Central Male White Bachelors Married Protestant Pretty happy 75000-89999 Not Str Rep ... Not legal Favor Yes Oppose About right Too little About right About right Bush Yes
2 3 South Central Female White HS Separated Protestant NaN 35000-39999 Strong Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
3 4 South Central Female White Left HS Divorced Protestant NaN 50000-59999 Ind, Near Dem ... NaN NaN NaN NaN About right Too little Too little Too little NaN NaN
4 5 South Central Male White Left HS Divorced Protestant NaN 40000-49999 Ind ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 6 South Central Male White HS Divorced Catholic Pretty happy 40000-49999 Ind, Near Rep ... NaN Favor Yes Oppose Too little Too little Too little Too little Bush Yes
6 7 South Central Female White Bachelors Married Protestant NaN NaN Strong Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
7 8 South Central Female White HS Married Protestant NaN NaN Ind ... NaN NaN NaN NaN Too little Too little About right About right Bush NaN
8 9 South Central Male White HS Divorced Catholic Not too happy 60000-74999 Strong Rep ... Legal Favor Yes Oppose NaN NaN NaN NaN Bush Yes
9 10 South Central Female Other HS Never Married Catholic NaN under 1000 Ind, Near Rep ... NaN NaN NaN NaN Too much Too little Too little Too little NaN NaN
10 11 South Central Male White HS Married None NaN 50000-59999 Strong Rep ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11 12 South Central Male White Left HS Married Protestant NaN 110000-129999 Not Str Rep ... NaN NaN NaN NaN About right About right Too much Too much Bush NaN
12 13 South Central Male Black Graduate Married Catholic NaN 90000-109999 Not Str Dem ... NaN NaN NaN NaN Too much About right Too little About right NaN NaN
13 14 South Central Female White HS Divorced Protestant Pretty happy 10000-124999 Strong Rep ... Not legal Favor No Favor NaN NaN NaN NaN Bush Yes
14 15 South Central Female Other HS Married Moslem/Islam NaN NaN Ind, Near Rep ... NaN NaN NaN NaN About right Too much Too much Too much NaN NaN
15 16 South Central Female White HS Married Orthodox-Christian NaN NaN Ind ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
16 17 South Central Female White HS Divorced Christian Not too happy NaN Not Str Rep ... NaN Favor No Favor About right Too little Too little About right NaN Yes
17 18 South Central Male White HS Never Married Protestant Very happy 40000-49999 Strong Rep ... Legal Favor NaN NaN NaN NaN NaN NaN Bush Yes
18 19 South Central Male White Jr Col Divorced None Pretty happy 75000-89999 Ind, Near Dem ... Legal Favor Yes Oppose Too little About right Too much About right NaN NaN
19 20 South Central Male White HS Never Married None NaN 25000-29999 Other party ... NaN NaN NaN NaN NaN NaN NaN NaN Nader NaN
20 21 South Central Female Black HS Never Married Protestant NaN 25000-29999 Ind ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21 22 South Central Male Black HS Separated Catholic Pretty happy 17500-19999 Not Str Dem ... NaN Oppose No Favor About right Too little Too little About right NaN Yes
22 23 South Central Male Other Bachelors Married Moslem/Islam Not too happy 50000-59999 Not Str Dem ... Legal Favor NaN NaN NaN NaN NaN NaN Gore Yes
23 24 South Central Female Black HS Married Protestant NaN 40000-49999 Not Str Dem ... NaN NaN NaN NaN NaN NaN NaN NaN Gore NaN
24 25 South Central Male White HS Married Protestant NaN 40000-49999 Strong Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
25 26 South Central Female White HS Widowed Protestant NaN NaN Other party ... NaN NaN NaN NaN Too little Too little About right About right Bush NaN
26 27 South Central Female White HS Widowed Catholic Pretty happy NaN Other party ... NaN Favor No Favor NaN NaN NaN NaN Bush Yes
27 28 South Central Female Other Bachelors Divorced None NaN 25000-29999 Ind, Near Dem ... NaN NaN NaN NaN Too much Too little Too little Too little NaN NaN
28 29 South Central Female Other HS Never Married Catholic NaN 1000-2999 Ind ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
29 30 South Central Female White HS Never Married Protestant NaN 10000-124999 Ind ... NaN NaN NaN NaN About right NaN About right NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2735 2736 South Atlantic Male White Graduate Married Protestant Pretty happy NaN Strong Rep ... Not legal Favor Yes Favor Too little Too much Too much Too little Bush Yes
2736 2737 South Atlantic Female White Bachelors Married Protestant NaN 25000-29999 Strong Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
2737 2738 South Atlantic Female Black HS Married Protestant Pretty happy 60000-74999 Strong Dem ... Not legal Oppose NaN NaN NaN NaN NaN NaN Gore Yes
2738 2739 South Atlantic Female White HS Widowed Protestant Pretty happy 25000-29999 Strong Dem ... Legal NaN No Favor NaN Too little Too little About right Gore Yes
2739 2740 South Atlantic Female White Left HS Separated Protestant NaN NaN Strong Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
2740 2741 South Atlantic Female White Bachelors Separated Protestant NaN NaN Ind, Near Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
2741 2742 South Atlantic Male White HS Married Protestant Pretty happy 15000-17499 Not Str Rep ... NaN Favor Yes Favor NaN NaN NaN NaN Bush Yes
2742 2743 South Atlantic Male White HS Never Married Protestant Pretty happy 25000-29999 Ind, Near Rep ... Legal Favor NaN NaN Too little Too little Too little Too little Bush Yes
2743 2744 South Atlantic Male Black HS Married Protestant NaN 22500-24999 Strong Dem ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2744 2745 South Atlantic Male White HS Never Married Protestant NaN NaN Ind, Near Rep ... NaN NaN NaN NaN About right Too little About right Too little NaN NaN
2745 2746 Pacific Female White Bachelors Married Protestant Pretty happy NaN Not Str Rep ... NaN Favor No Favor NaN NaN NaN NaN Bush NaN
2746 2747 Pacific Female White HS Widowed Catholic Pretty happy NaN Strong Rep ... Not legal Oppose NaN NaN Too little Too little Too much Too little Bush Yes
2747 2748 Pacific Female White HS Never Married Protestant Very happy 8000-9999 Not Str Rep ... NaN Favor Yes Favor NaN NaN NaN NaN NaN Yes
2748 2749 Pacific Female White HS Widowed Protestant NaN NaN Not Str Dem ... NaN NaN NaN NaN Too little Too little Too much About right Gore NaN
2749 2750 Mid-Atl Male White Jr Col Married Protestant NaN 22500-24999 Not Str Rep ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
2750 2751 Mid-Atl Female White HS Married Protestant Not too happy 6000-6999 Ind ... Not legal NaN NaN NaN NaN NaN NaN NaN Gore Yes
2751 2752 Mid-Atl Male White Left HS Married Protestant NaN 22500-24999 Strong Rep ... NaN NaN NaN NaN Too little Too little Too little About right NaN NaN
2752 2753 South Central Female White Jr Col Married Protestant NaN NaN Other party ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2753 2754 South Central Male Black HS Never Married Catholic Very happy 35000-39999 Not Str Dem ... NaN Favor No Favor About right Too little Too little Too little Bush NaN
2754 2755 South Central Female White HS Divorced Protestant NaN NaN Strong Rep ... NaN NaN NaN NaN About right About right Too little About right Bush NaN
2755 2756 South Central Female White HS Married Protestant NaN 35000-39999 Not Str Dem ... NaN NaN NaN NaN NaN NaN NaN NaN Bush NaN
2756 2757 South Central Male Black HS Married Protestant Very happy 30000-34999 Strong Dem ... Not legal Favor No Favor NaN NaN NaN NaN Gore Yes
2757 2758 New Engl Male White HS Divorced Protestant NaN 6000-6999 Not Str Rep ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2758 2759 New Engl Female White HS Never Married None NaN 12500-14999 Ind, Near Dem ... NaN NaN NaN NaN NaN NaN NaN NaN Gore NaN
2759 2760 New Engl Female White HS Divorced None NaN 20000-22499 Not Str Rep ... NaN NaN NaN NaN Too much About right Too little NaN NaN NaN
2760 2761 New Engl Male White Left HS Never Married None Pretty happy 22500-24999 Ind, Near Dem ... Legal Favor NaN NaN NaN NaN NaN NaN NaN Yes
2761 2762 New Engl Male White Bachelors Married None NaN NaN Ind, Near Dem ... NaN NaN NaN NaN NaN NaN NaN NaN Nader NaN
2762 2763 New Engl Female White HS Married Catholic NaN NaN Not Str Rep ... NaN NaN NaN NaN Too little Too much Too much About right Bush NaN
2763 2764 South Atlantic Male Black HS Never Married Protestant NaN NaN Ind ... NaN NaN NaN NaN About right Too little Too little Too much NaN NaN
2764 2765 South Atlantic Male White HS Married Protestant Very happy 60000-74999 Not Str Rep ... Legal Oppose Yes Favor NaN NaN NaN NaN Bush Yes

2765 rows × 21 columns

Each row corresponds to a single respondent.

Show the column names:

gss.columns
Index(['ID', 'Region', 'Gender', 'Race', 'Education', 'Marital', 'Religion',
       'Happy', 'Income', 'PolParty', 'Politics', 'Marijuana', 'DeathPenalty',
       'OwnGun', 'GunLaw', 'SpendMilitary', 'SpendEduc', 'SpendEnv',
       'SpendSci', 'Pres00', 'Postlife'],
      dtype='object')

We want to work with only two columns from this data frame. These are “Income”, and “DeathPenalty”.

“Income” gives the income bracket of the respondent. “DeathPenalty” is the answer to a question about whether they “Favor” or “Oppose” the death penalty.

First make a list with the names of the columns that we want.

cols = ['Income', 'DeathPenalty']
cols
['Income', 'DeathPenalty']

Next make a new data frame by indexing the data frame with this list.

The new data frame has only the columns we selected.

money_death = gss[cols]
money_death
Income DeathPenalty
0 30000-34999 Favor
1 75000-89999 Favor
2 35000-39999 NaN
3 50000-59999 NaN
4 40000-49999 NaN
5 40000-49999 Favor
6 NaN NaN
7 NaN NaN
8 60000-74999 Favor
9 under 1000 NaN
10 50000-59999 NaN
11 110000-129999 NaN
12 90000-109999 NaN
13 10000-124999 Favor
14 NaN NaN
15 NaN NaN
16 NaN Favor
17 40000-49999 Favor
18 75000-89999 Favor
19 25000-29999 NaN
20 25000-29999 NaN
21 17500-19999 Oppose
22 50000-59999 Favor
23 40000-49999 NaN
24 40000-49999 NaN
25 NaN NaN
26 NaN Favor
27 25000-29999 NaN
28 1000-2999 NaN
29 10000-124999 NaN
... ... ...
2735 NaN Favor
2736 25000-29999 NaN
2737 60000-74999 Oppose
2738 25000-29999 NaN
2739 NaN NaN
2740 NaN NaN
2741 15000-17499 Favor
2742 25000-29999 Favor
2743 22500-24999 NaN
2744 NaN NaN
2745 NaN Favor
2746 NaN Oppose
2747 8000-9999 Favor
2748 NaN NaN
2749 22500-24999 NaN
2750 6000-6999 NaN
2751 22500-24999 NaN
2752 NaN NaN
2753 35000-39999 Favor
2754 NaN NaN
2755 35000-39999 NaN
2756 30000-34999 Favor
2757 6000-6999 NaN
2758 12500-14999 NaN
2759 20000-22499 NaN
2760 22500-24999 Favor
2761 NaN NaN
2762 NaN NaN
2763 NaN NaN
2764 60000-74999 Oppose

2765 rows × 2 columns

There are many missing question responses, indicated by NaN. To make our life easier, we drop the respondents who didn’t specify an income bracket, and those who did not give an answer to the death penalty question. We use Pandas dropna method of the data frame, to drop all rows that have any missing values in the row.

money_death = money_death.dropna()
money_death
Income DeathPenalty
0 30000-34999 Favor
1 75000-89999 Favor
5 40000-49999 Favor
8 60000-74999 Favor
13 10000-124999 Favor
17 40000-49999 Favor
18 75000-89999 Favor
21 17500-19999 Oppose
22 50000-59999 Favor
31 30000-34999 Favor
32 50000-59999 Oppose
33 75000-89999 Oppose
35 under 1000 Oppose
36 7000-7999 Oppose
37 60000-74999 Favor
42 30000-34999 Favor
45 35000-39999 Favor
46 under 1000 Favor
52 17500-19999 Favor
55 35000-39999 Favor
58 1000-2999 Favor
62 50000-59999 Favor
64 12500-14999 Favor
74 110000-129999 Oppose
77 75000-89999 Favor
78 35000-39999 Favor
81 30000-34999 Favor
92 20000-22499 Favor
93 60000-74999 Favor
95 60000-74999 Oppose
... ... ...
2671 75000-89999 Favor
2677 1000-2999 Oppose
2678 15000-17499 Favor
2684 under 1000 Favor
2689 3000-3999 Favor
2690 22500-24999 Oppose
2692 8000-9999 Favor
2696 3000-3999 Oppose
2697 30000-34999 Favor
2699 25000-29999 Favor
2702 8000-9999 Oppose
2706 10000-124999 Oppose
2709 12500-14999 Oppose
2714 12500-14999 Favor
2715 40000-49999 Favor
2716 130000-149999 Favor
2717 3000-3999 Oppose
2723 22500-24999 Favor
2725 40000-49999 Favor
2726 15000-17499 Oppose
2727 12500-14999 Favor
2729 under 1000 Favor
2737 60000-74999 Oppose
2741 15000-17499 Favor
2742 25000-29999 Favor
2747 8000-9999 Favor
2753 35000-39999 Favor
2756 30000-34999 Favor
2760 22500-24999 Favor
2764 60000-74999 Oppose

904 rows × 2 columns

Get the income column.

income = money_death['Income']

Show the unique values:

income.value_counts()
40000-49999      88
30000-34999      78
50000-59999      72
25000-29999      60
35000-39999      54
60000-74999      51
20000-22499      44
12500-14999      44
130000-149999    43
22500-24999      40
110000-129999    38
17500-19999      37
15000-17499      36
10000-124999     36
1000-2999        32
8000-9999        32
75000-89999      26
3000-3999        19
under 1000       17
5000-5999        16
4000-4999        13
90000-109999     11
7000-7999         9
6000-6999         8
Name: Income, dtype: int64

These are strings. We want to get income as a number. We estimate this by recoding the “Income” column. We replace the string, giving the income bracket, with the average of the minimum and maximum in the range.

We can do this with a recoder function. We have not covered functions yet, so do not worry about the details of this function.

def recode_income(value):
    if value == 'under 1000':
        return 500
    low_str, high_str = value.split('-')
    low, high = int(low_str), int(high_str)
    return np.mean([low, high])

Here is what the recoder function gives with the lowest income bracket.

recode_income('under 1000')
500

Here is the return from a higher bracket:

recode_income('90000-109999')
99999.5

Use this function to recode the “Income” strings into numbers. Again, we have not covered the apply method yet, so don’t worry about the details.

income_ish = income.apply(recode_income)
income_ish
0        32499.5
1        82499.5
5        44999.5
8        67499.5
13       67499.5
17       44999.5
18       82499.5
21       18749.5
22       54999.5
31       32499.5
32       54999.5
33       82499.5
35         500.0
36        7499.5
37       67499.5
42       32499.5
45       37499.5
46         500.0
52       18749.5
55       37499.5
58        1999.5
62       54999.5
64       13749.5
74      119999.5
77       82499.5
78       37499.5
81       32499.5
92       21249.5
93       67499.5
95       67499.5
          ...   
2671     82499.5
2677      1999.5
2678     16249.5
2684       500.0
2689      3499.5
2690     23749.5
2692      8999.5
2696      3499.5
2697     32499.5
2699     27499.5
2702      8999.5
2706     67499.5
2709     13749.5
2714     13749.5
2715     44999.5
2716    139999.5
2717      3499.5
2723     23749.5
2725     44999.5
2726     16249.5
2727     13749.5
2729       500.0
2737     67499.5
2741     16249.5
2742     27499.5
2747      8999.5
2753     37499.5
2756     32499.5
2760     23749.5
2764     67499.5
Name: Income, Length: 904, dtype: float64

Now get the results of the answer to the death penalty question.

death = money_death['DeathPenalty']
death.value_counts()
Favor     622
Oppose    282
Name: DeathPenalty, dtype: int64

We will identify the rows for respondents who are in favor of the death penalty. To do this, we make a Boolean vector:

death == 'Favor'
0        True
1        True
5        True
8        True
13       True
17       True
18       True
21      False
22       True
31       True
32      False
33      False
35      False
36      False
37       True
42       True
45       True
46       True
52       True
55       True
58       True
62       True
64       True
74      False
77       True
78       True
81       True
92       True
93       True
95      False
        ...  
2671     True
2677    False
2678     True
2684     True
2689     True
2690    False
2692     True
2696    False
2697     True
2699     True
2702    False
2706    False
2709    False
2714     True
2715     True
2716     True
2717    False
2723     True
2725     True
2726    False
2727     True
2729     True
2737    False
2741     True
2742     True
2747     True
2753     True
2756     True
2760     True
2764    False
Name: DeathPenalty, Length: 904, dtype: bool

Use this vector to select the income values for the respondents in favor of the death penalty. Show the distribution of values.

favor_income = income_ish[death == 'Favor']
favor_income.hist();

png

Likewise select incomes for those opposed. Show the distribution.

oppose_income = income_ish[death == 'Oppose']
oppose_income.hist();

png

Calculate the difference in mean income between the groups. This is the difference we observe.

actual_diff = np.mean(favor_income) - np.mean(oppose_income)
actual_diff
4535.163012246019

We want to know whether this difference in income is compatible with random sampling. That is, we want to know whether a difference this large is plausible, if the incomes are in fact random samples from the same population.

To estimate how variable the mean differences can be, for such random sampling, we simulate this sampling by pooling the income values that we have, from the two groups, and the permuting them.

First, we get the number of respondents in favor of the death penalty.

n_favor = len(favor_income)
n_favor
622

Then we pool the in-favor and oppose groups.

pooled = np.append(favor_income, oppose_income)

To do the random sampling we permute the values, so the pooled vector is a random mixture of the two groups.

np.random.shuffle(pooled)

Treat the first n_favor observations from this shuffled vector as our simulated in-favor group. The rest are our simulated oppose group.

fake_favor = pooled[:n_favor]
fake_oppose = pooled[n_favor:]

Calculate the difference in means for this simulation.

fake_diff = np.mean(fake_favor) - np.mean(fake_oppose)
fake_diff
3143.6351793573704

Now it is your turn. Do this simulation 10000 times, to build up the distribution of differences compatible with random sampling.

Use the Brexit ages notebook for inspiration.

differences = np.zeros(10000)
for i in np.arange(10000):
    # Permute the pooled incomes
    np.random.shuffle(pooled)
    # Make a fake favor sample

    # Make a fake opposed sample

    # Calculate the mean difference for the fake samples

    # Put the mean difference into the differences array.

When you have that working, do a histogram of the differences.

# Your code here

You can get an idea of where the actual difference we saw sits on this histogram, and therefore how likely that difference is, assuming the incomes come from the same underlying population of incomes.

To be more specific, count how many of the differences you calculated were greater than or equal to the actual difference.

# Your code here

Now calculate the proportion of these differences, to give an estimate of the probability of seeing a difference this large, if the incomes all come from the same underlying population:

# Your code here