\(\newcommand{L}[1]{\| #1 \|}\newcommand{VL}[1]{\L{ \vec{#1} }}\newcommand{R}[1]{\operatorname{Re}\,(#1)}\newcommand{I}[1]{\operatorname{Im}\, (#1)}\)

A two-group permutation testΒΆ

Test the null hypothesis that two samples of values could have come from the same underlying distribution.

See: Comparing two groups with permutation testing.

>>> import random
>>> def mean(some_list):
...     return sum(some_list) / len(some_list)
>>> def two_group_permute(group_1, group_2):
...     n_samples = 10000
...     n_group_1 = len(group_1)
...     combined = list(group_1) + list(group_2)
...     observed = mean(group_1) - mean(group_2)
...     samples = []
...     for i in range(n_samples):
...         random.shuffle(combined)
...         fake_mean_1 = mean(combined[:n_group_1])
...         fake_mean_2 = mean(combined[n_group_1:])
...         samples.append(fake_mean_1 - fake_mean_2)
...     return observed, samples

In action on the Brexit age data:

>>> import pandas as pd
>>> remain_leave = pd.read_csv('remain_leave.csv')
>>> remainers = remain_leave[remain_leave['brexit'] == 1]
>>> brexiteers = remain_leave[remain_leave['brexit'] == 2]
>>> # We make a list from the Pandas column with the "list" function
>>> brexit_ages = list(brexiteers['age'])
>>> remain_ages = list(remainers['age'])
>>> actual, samples = two_group_permute(brexit_ages, remain_ages)
>>> actual
3.6998380833655773

Hint

If running in the IPython console, consider running %matplotlib to enable interactive plots. If running in the Jupyter Notebook, use %matplotlib inline.

>>> import matplotlib.pyplot as plt
>>> plt.hist(samples)
(...)

(png, hires.png, pdf)

_images/permuter-5.png