Non-parametric hypothesis tests with examples in Julia

A tutorial on non-parametric hypothesis tests with examples in Julia.
Author

Dhruva Sambrani

Published

November 30, 2022

2022-11-30 First draft

Introduction

This article is an extension of Farmer. 2022. “Non-Parametric Hypothesis Tests with Examples in R.” November 18, 2022. Please check out the parent article for the theoretical background.

Import packages

import Pkg
Pkg.activate(".")
using CSV
using Plots
using HypothesisTests
using DataFrames
  Activating project at `~/sandbox/dataalltheway/posts/011-01-non-parametric-hypothesis-tests-julia`

Data import and cleanup

I have subsetted the data from 1928 onward and dropped any columns with all NAs or zeros. To do so for eachcol of data we first calculate whether all the elements are !ismissing && !=0 (! = not). Then pick all rows for those columns, while disallowmissing data.

temp_file = download("https://zenodo.org/record/7081360/files/1.%20Cement_emissions_data.csv")
data = CSV.read(temp_file, DataFrame)
dropmissing!(data, :Year)
filter!(:Year => >=(1928), data)
picked_cols_mask = eachcol(data) .|> 
    col -> all(x->(!ismissing(x) && x!=0), col)
data = disallowmissing(data[!, picked_cols_mask])
94×24 DataFrame
69 rows omitted
Row Year Argentina Australia Belgium Brazil Canada Chile China Democratic Republic of the Congo Denmark Egypt Finland Italy Japan Mozambique Norway Peru Portugal Romania Spain Sweden Turkey USA Global
Int64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
1 1928 116.3 378.0 1505.0 43.61 868.6 54.57 39.78 21.81 385.2 43.8 138.1 1519.0 1897.0 7.27 158.3 25.42 36.34 163.6 763.2 233.2 29.07 15420.0 35616.0
2 1929 174.4 356.2 1606.0 47.25 963.0 72.62 76.51 29.07 396.1 87.22 138.1 1730.0 2112.0 10.9 159.1 25.43 43.61 156.3 901.3 284.0 32.71 15000.0 36873.0
3 1930 189.0 348.9 1508.0 43.58 926.7 79.88 73.45 32.71 385.2 148.5 101.6 1723.0 1853.0 10.9 160.2 10.9 47.29 196.3 908.4 304.6 29.07 14290.0 35561.0
4 1931 265.3 196.3 1218.0 83.59 799.5 50.88 97.93 21.81 250.8 119.9 79.95 1519.0 1788.0 10.9 109.7 14.54 47.25 98.14 806.8 257.9 50.83 10810.0 30931.0
5 1932 247.1 123.6 1039.0 72.69 363.4 54.51 79.57 7.27 203.6 119.9 76.23 1545.0 1843.0 10.9 117.3 10.9 58.15 105.4 705.1 241.0 54.46 6656.0 24721.0
6 1933 254.5 159.9 963.1 109.0 189.0 69.05 113.2 3.63 272.6 141.7 80.05 1756.0 2366.0 10.18 110.8 14.53 79.95 109.0 694.1 200.7 58.15 5587.0 23866.0
7 1934 279.8 207.2 937.6 159.9 272.6 101.7 97.93 3.63 381.6 145.4 112.7 2013.0 2304.0 7.27 123.6 21.82 90.94 156.3 672.3 287.1 83.59 6900.0 28542.0
8 1935 356.3 276.2 1087.0 181.6 272.6 141.6 156.1 3.63 374.3 189.0 134.3 2086.0 2904.0 7.27 130.8 29.07 105.4 189.0 668.7 367.1 65.42 6684.0 32090.0
9 1936 410.7 323.5 1163.0 239.7 388.9 123.5 428.5 3.63 392.4 167.2 163.5 1890.0 3082.0 7.27 149.0 36.34 119.9 185.3 297.9 392.5 69.05 9950.0 38763.0
10 1937 512.3 363.4 1486.0 283.5 483.4 156.3 437.7 3.77 334.4 163.5 203.4 2155.0 2984.0 7.27 159.9 39.96 127.1 225.3 189.0 432.5 105.4 10370.0 40829.0
11 1938 614.3 178.1 1508.0 305.5 432.5 181.7 9.18 7.5 316.2 185.3 236.2 2279.0 2729.0 11.99 163.6 50.86 130.8 221.7 294.4 490.6 130.9 9239.0 38551.0
12 1939 556.0 338.0 1261.0 345.5 450.7 167.2 223.4 17.44 345.3 185.0 279.6 2526.0 2508.0 14.54 192.6 58.18 145.4 261.7 588.8 585.1 141.7 10790.0 35687.0
13 1940 534.4 348.8 105.4 367.1 592.4 189.0 272.4 11.45 218.1 178.1 149.0 2373.0 2101.0 14.54 167.2 61.78 134.5 196.3 770.5 345.3 130.9 11550.0 31431.0
83 2010 4178.0 3549.0 2582.0 21290.0 6005.0 1046.0 639600.0 189.0 672.2 20510.0 525.7 13280.0 24320.0 340.9 754.0 3338.0 3376.0 2778.0 11200.0 1324.0 29980.0 31450.0 1.2549e6
84 2011 4586.0 3496.0 2762.0 22840.0 6020.0 1080.0 708600.0 175.2 861.8 20100.0 557.8 12580.0 24980.0 373.3 749.0 3300.0 2813.0 3089.0 9523.0 1361.0 31450.0 32210.0 1.3498e6
85 2012 4184.0 3518.0 2643.0 25000.0 6532.0 1128.0 714800.0 157.9 871.1 20970.0 497.2 10070.0 25620.0 452.7 725.0 3731.0 2550.0 3150.0 8754.0 1479.0 31370.0 35270.0 1.3846e6
86 2013 4581.0 3294.0 2541.0 26650.0 5973.0 1086.0 748300.0 174.0 867.1 20210.0 481.8 8877.0 26810.0 505.6 731.0 4257.0 2814.0 2695.0 7642.0 1402.0 33910.0 36370.0 1.4441e6
87 2014 4336.0 3138.0 2643.0 26910.0 5912.0 1022.0 778600.0 127.9 887.3 20760.0 468.8 8339.0 26560.0 585.9 727.0 4590.0 3096.0 2944.0 8897.0 1399.0 34500.0 39440.0 1.4999e6
88 2015 4571.0 3076.0 2348.0 25080.0 6185.0 1033.0 722000.0 154.6 931.5 21650.0 462.1 8196.0 25940.0 614.2 672.0 4476.0 2921.0 3337.0 9216.0 1537.0 34440.0 39910.0 1.4444e6
89 2016 4029.0 2931.0 2436.0 22420.0 6114.0 1120.0 743000.0 98.04 1095.0 22820.0 553.2 7680.0 25970.0 947.8 684.0 4340.0 2297.0 3181.0 9414.0 1554.0 37530.0 39440.0 1.4876e6
90 2017 4362.0 3019.0 2291.0 19080.0 6827.0 865.9 758200.0 348.7 1194.0 21770.0 603.7 7711.0 26430.0 910.6 766.0 4291.0 2531.0 3310.0 9449.0 1484.0 39470.0 40320.0 1.5079e6
91 2018 4369.0 2942.0 2534.0 19340.0 6915.0 782.2 786700.0 406.1 1160.0 21000.0 601.7 7757.0 26180.0 930.0 730.0 4320.0 2251.0 3505.0 9667.0 1607.0 39410.0 38970.0 1.5692e6
92 2019 4141.0 3040.0 2819.0 19860.0 7125.0 825.3 826900.0 451.0 1129.0 19670.0 583.5 7912.0 25330.0 1011.0 722.0 4546.0 2225.0 3828.0 9064.0 1349.0 32350.0 40900.0 1.6175e6
93 2020 3508.0 2820.0 2634.0 22050.0 6625.0 825.3 858200.0 451.0 1227.0 18130.0 569.7 7059.0 24490.0 1011.0 725.0 4546.0 2310.0 3901.0 8192.0 1272.0 40810.0 40690.0 1.6375e6
94 2021 4671.0 2820.0 2634.0 23790.0 6625.0 825.3 853000.0 451.0 1227.0 16160.0 569.7 7059.0 23790.0 1011.0 701.3 4546.0 2310.0 3901.0 8609.0 1272.0 44390.0 41200.0 1.6729e6
plot(
    data[!, :Year], 
    Array(log.(data[!, Not(:Year)])), 
    label=reshape(string.(propertynames(data)[2:end]), 1, :),
    legend= :outerright,
    size=(900, 400)
)

Wilcoxon rank-sum (Mann-Whitney U test)

mwut_results = MannWhitneyUTest(data[!, :USA], data[!, :Canada], )
Approximate Mann-Whitney U test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          28503.5

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-30

Details:
    number of observations in each group: [94, 94]
    Mann-Whitney-U statistic:             8763.0
    rank sums:                            [13228.0, 4538.0]
    adjustment for ties:                  24.0
    normal approximation (μ, σ):          (4345.0, 373.05)

Right tailed test

pvalue(mwut_results, tail=:right)
1.2040605143479147e-31

Wilcoxon signed-rank test

dt = select(filter(:Year=> y-> y==2000 || y==2020, data), Not(:Year))
x = collect(dt[1, :])
y = collect(dt[2, :])
SignedRankTest(x, y)
Exact Wilcoxon signed rank test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          16.0
    95% confidence interval: (-3848.0, 621.5)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.5803

Details:
    number of observations:      23
    Wilcoxon rank-sum statistic: 119.0
    rank sums:                   [119.0, 157.0]
    adjustment for ties:         0.0

Kruskal-Wallis test

KruskalWallisTest(collect(eachcol(data[:, Not(:Year)]))...)
Kruskal-Wallis rank sum test (chi-square approximation)
-------------------------------------------------------
Population details:
    parameter of interest:   Location parameters
    value under h_0:         "all equal"
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-99

Details:
    number of observation in each group: [94, 94, 94, 94, 94, 94, 94, 94, 94, 94  …  94, 94, 94, 94, 94, 94, 94, 94, 94, 94]
    χ²-statistic:                        1253.58
    rank sums:                           [100736.0, 98787.5, 1.11532e5, 1.21338e5, 1.21336e5, 55862.0, 138744.0, 20383.0, 71558.5, 1.03662e5  …  19940.0, 59493.0, 68485.5, 84067.0, 1.0239e5, 132068.0, 83910.0, 1.10562e5, 1.7974e5, 195500.0]
    degrees of freedom:                  22
    adjustment for ties:                 0.999999
Back to top

Citation

BibTeX citation:
@online{sambrani2022,
  author = {Sambrani, Dhruva},
  title = {Non-Parametric Hypothesis Tests with Examples in {Julia}},
  date = {2022-11-30},
  url = {https://dataalltheway.com/posts/011-01-non-parametric-hypothesis-tests-julia},
  langid = {en}
}
For attribution, please cite this work as:
Sambrani, Dhruva. 2022. “Non-Parametric Hypothesis Tests with Examples in Julia.” November 30, 2022. https://dataalltheway.com/posts/011-01-non-parametric-hypothesis-tests-julia.