Parametric hypothesis tests with examples in Julia

Dhruva Sambrani

Parametric hypothesis tests with examples in Julia

How To

Parametric Tests

T-test

Z-test

F-test

Julia

A tutorial on parametric hypothesis tests with examples in Julia.

Author

Dhruva Sambrani

Published

November 17, 2022

Update history

2022-11-17 First draft

Introduction

This article is an extension of Rohit Farmer. 2022. “Parametric Hypothesis Tests with Examples in R.” November 10, 2022. Please check out the parent article for the theoretical background.

Z-test (Section 3)
T-test (Section 4)
F-test (Section 5)
ANOVA (Section 6)

Import packages

import Pkg
Pkg.activate(".")
using CSV
using DataFrames
using Statistics
using HypothesisTests

  Activating project at `~/sandbox/dataalltheway/posts/010-01-parametric-hypothesis-tests-julia`

Getting the data

Some cleaning is necessary since the data is not of the correct types.

begin
    data = CSV.read(download("https://raw.githubusercontent.com/opencasestudies/ocs-bp-rural-and-urban-obesity/master/data/wrangled/BMI_long.csv"), DataFrame) # download and load
    allowmissing!(data, :BMI) # Allow BMI col to have missing values
    replace!(data.BMI, "NA" => missing) # Convert "NA" to missing
    data[!, :BMI] .= passmissing(parse).(Float64, (data[!, :BMI])) # Typecast into Float64?
end;

first(data, 20)

20×5 DataFrame

Row	Country	Sex	Region	Year	BMI
	String	String7	String15	Int64	Float64?
1	Afghanistan	Men	National	1985	20.2
2	Afghanistan	Men	Rural	1985	19.7
3	Afghanistan	Men	Urban	1985	22.4
4	Afghanistan	Men	National	2017	22.8
5	Afghanistan	Men	Rural	2017	22.5
6	Afghanistan	Men	Urban	2017	23.6
7	Afghanistan	Women	National	1985	20.6
8	Afghanistan	Women	Rural	1985	20.1
9	Afghanistan	Women	Urban	1985	23.2
10	Afghanistan	Women	National	2017	24.4
11	Afghanistan	Women	Rural	2017	23.6
12	Afghanistan	Women	Urban	2017	26.3
13	Albania	Men	National	1985	25.2
14	Albania	Men	Rural	1985	25.0
15	Albania	Men	Urban	1985	25.4
16	Albania	Men	National	2017	27.0
17	Albania	Men	Rural	2017	26.9
18	Albania	Men	Urban	2017	27.0
19	Albania	Women	National	1985	26.0
20	Albania	Women	Rural	1985	26.1

Z-test

Two sample unpaired z-test

uneqvarztest = let
    # Fetch a random sample of BMI data for women in the year 1985 and 2017
    x1 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==1985 , data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==2017 , data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    UnequalVarianceZTest(x1, x2)
end

Two sample z-test (unequal variance)
------------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -2.26
    95% confidence interval: (-2.679, -1.841)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-25

Details:
    number of observations:   [300,300]
    z-statistic:              -10.560590588866509
    population standard error: 0.21400318296427412

Two sample paired z-test

eqvarztest = let
    # Fetch a random sample of BMI data for women in the year 1985 and 2017
    x1 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==1985 , data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==2017 , data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    EqualVarianceZTest(x1, x2)
end

Two sample z-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -2.173
    95% confidence interval: (-2.611, -1.735)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-21

Details:
    number of observations:   [300,300]
    z-statistic:              -9.724414586039652
    population standard error: 0.22345818154642977

T-test

One sample t-test

onesamplettest = let 
    x1 = filter(
        [:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Men" && r=="Rural" && y == 2017,
        data
    ) |>
    x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    OneSampleTTest(x1, 24.5)
end

One sample t-test
-----------------
Population details:
    parameter of interest:   Mean
    value under h_0:         24.5
    point estimate:          25.466
    95% confidence interval: (25.16, 25.77)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-08

Details:
    number of observations:   300
    t-statistic:              6.280721563263261
    degrees of freedom:       299
    empirical standard error: 0.15380398418714467

Two sample unpaired (independent) t-test

unpairedtwosamplettest = let 
    x1 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    UnequalVarianceTTest(x1, x2)
end

Two sample t-test (unequal variance)
------------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -1.05867
    95% confidence interval: (-1.512, -0.6054)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-05

Details:
    number of observations:   [300,300]
    t-statistic:              -4.587387795167387
    degrees of freedom:       575.968012373301
    empirical standard error: 0.2307776699807073

Welch’s Test

This test uses the Welch correction, and there is no way to turn it off in HypothesisTests.jl.

Only considering right tailed (one-tailed)

unpairedtwosamplettest = let 
    x1 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    UnequalVarianceTTest(x1, x2)
end
pvalue(unpairedtwosamplettest, tail=:right)

0.9999999445762

Two sample paired (dependent) t-test

pairedtwosamplettest = let 
    x1 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    EqualVarianceTTest(x1, x2)
end

Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -1.01167
    95% confidence interval: (-1.44, -0.5838)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-05

Details:
    number of observations:   [300,300]
    t-statistic:              -4.64337449574737
    degrees of freedom:       598
    empirical standard error: 0.2178731583233696

F-test

Ftest = let 
    x1 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    VarianceFTest(x1, x2)
end

Variance F-test
---------------
Population details:
    parameter of interest:   variance ratio
    value under h_0:         1.0
    point estimate:          1.32245

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0159

Details:
    number of observations: [300, 300]
    F statistic:            1.322447205183649
    degrees of freedom:     [299, 299]

ANOVA

Atest = let
    x = filter([:Sex, :Year] => (s,y) -> (s=="Men" && y==2017), data)
    groups = groupby(x, :Region)
    bmis = map(keys(groups)) do key # for each group, 
        collect(skipmissing(groups[key][!, :BMI])) # collect BMI, skipping missing values
    end
    res = OneWayANOVATest(bmis...)
end

One-way analysis of variance (ANOVA) test
-----------------------------------------
Population details:
    parameter of interest:   Means
    value under h_0:         "all equal"
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    p-value:                     0.0333

Details:
    number of observations: [200, 196, 199]
    F statistic:            3.42167
    degrees of freedom:     (2, 592)

Tukey’s HSD Test

Currently, there is no implementation of this test in HypothesisTests.jl

Back to top

Citation

BibTeX citation:

@online{sambrani2022,
  author = {Sambrani, Dhruva},
  title = {Parametric Hypothesis Tests with Examples in {Julia}},
  date = {2022-11-17},
  url = {https://dataalltheway.com/posts/010-01-parametric-hypothesis-tests-julia},
  langid = {en}
}

For attribution, please cite this work as:

Sambrani, Dhruva. 2022. “Parametric Hypothesis Tests with Examples in Julia.” November 17, 2022. https://dataalltheway.com/posts/010-01-parametric-hypothesis-tests-julia.