############################ A two-group permutation test ############################ .. code-links:: clear Test the null hypothesis that two samples of values could have come from the same underlying distribution. See: :doc:`brexit_ages`. .. nbplot:: >>> import random >>> def mean(some_list): ... return sum(some_list) / len(some_list) >>> def two_group_permute(group_1, group_2): ... n_samples = 10000 ... n_group_1 = len(group_1) ... combined = list(group_1) + list(group_2) ... observed = mean(group_1) - mean(group_2) ... samples = [] ... for i in range(n_samples): ... random.shuffle(combined) ... fake_mean_1 = mean(combined[:n_group_1]) ... fake_mean_2 = mean(combined[n_group_1:]) ... samples.append(fake_mean_1 - fake_mean_2) ... return observed, samples In action on the Brexit age data: .. nbplot:: >>> import pandas as pd >>> remain_leave = pd.read_csv('remain_leave.csv') >>> remainers = remain_leave[remain_leave['brexit'] == 1] >>> brexiteers = remain_leave[remain_leave['brexit'] == 2] .. nbplot:: >>> # We make a list from the Pandas column with the "list" function >>> brexit_ages = list(brexiteers['age']) >>> remain_ages = list(remainers['age']) .. nbplot:: >>> actual, samples = two_group_permute(brexit_ages, remain_ages) >>> actual 3.6998380833655773 .. mpl-interactive:: .. nbplot:: >>> import matplotlib.pyplot as plt >>> plt.hist(samples) (...)