Parametric hypothesis tests with examples in Python

import numpy as np
from scipy import stats
import pandas as pd

dat = pd.read_csv("https://raw.githubusercontent.com/opencasestudies/ocs-bp-rural-and-urban-obesity/master/data/wrangled/BMI_long.csv")

Z-test

Example code for a two sample unpaired z-test

from statsmodels.stats.weightstats import ztest as ztest
import random

mask1 = (dat['Sex'] == "Women") & (dat['Year'] == 1985)
x1 = dat[mask1]['BMI']
x1 = x1.array.dropna()
x1 = random.sample(x1.tolist(), k = 300)

mask2 = (dat['Sex'] == "Women") & (dat['Year'] == 2017)
x2 = dat[mask2]['BMI']
x2 = x2.array.dropna()
x2 = random.sample(x2.tolist(), k = 300)

z_statistics, p_value = ztest(x1, x2, value=0) 

print("z-statistic:", z_statistics)
print("p-value:", p_value)

z-statistic: -9.201889936608346
p-value: 3.517084717411295e-20

T-test

Example code for a two-tailed t-test

mask1 = (dat['Sex'] == "Women") & (dat['Region'] == "Rural") & (dat['Year'] == 1985)
x1 = dat[mask1]['BMI']

mask2 = (dat['Sex'] == "Women") & (dat['Region'] == "Urban") & (dat['Year'] == 1985)
x2 = dat[mask2]['BMI']

t_statistic, p_value = stats.ttest_ind(x1, x2, equal_var = True, nan_policy = "omit")

print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: -3.8952336023562912
p-value: 0.00011523146459551333

Example code for a one-tailed t-test

t_statistic, p_value = stats.ttest_ind(x1, x2, equal_var = True, nan_policy = "omit", alternative = "greater")

print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: -3.8952336023562912
p-value: 0.9999423842677022

Two sample paired (dependent) t-test

t_statistic, p_value = stats.ttest_rel(x1, x2, nan_policy = "omit")

print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: -14.095486243034763
p-value: 1.426675846865914e-31

ANOVA

Example code for a oneway ANOVA

mask1 = (dat['Sex'] == "Men") & (dat['Region'] == "Rural") & (dat['Year'] == 2017)
x1 = dat[mask1]['BMI']

mask2 = (dat['Sex'] == "Men") & (dat['Region'] == "Urban") & (dat['Year'] == 2017)
x2 = dat[mask2]['BMI']

mask3 = (dat['Sex'] == "Men") & (dat['Region'] == "National") & (dat['Year'] == 2017)
x3 = dat[mask3]['BMI']

f_value, p_value = stats.f_oneway(x1.array.dropna(), x2.array.dropna(), x3.array.dropna())

print("f-value statistic: ",f_value)
print("p-value: ", p_value)

f-value statistic:  3.4215235158825905
p-value:  0.033309935710150805

Back to top

Citation

BibTeX citation:

@online{farmer2023,
  author = {Farmer, Rohit},
  title = {Parametric Hypothesis Tests with Examples in {Python}},
  date = {2023-01-09},
  url = {https://dataalltheway.com/posts/010-02-parametric-hypothesis-tests-python},
  langid = {en}
}

For attribution, please cite this work as:

Farmer, Rohit. 2023. “Parametric Hypothesis Tests with Examples in Python.” January 9, 2023. https://dataalltheway.com/posts/010-02-parametric-hypothesis-tests-python.