Non-parametric hypothesis tests with examples in Julia

How To
Non-parametric Tests
Wilcoxon Test
Mann Whitney U Test
Kruskal-Wallis Test
Julia
A tutorial on non-parametric hypothesis tests with examples in Julia.
Author

Dhruva Sambrani

Published

November 30, 2022

2022-11-30 First draft

Introduction

This article is an extension of Farmer. 2022. “Non-Parametric Hypothesis Tests with Examples in R.” November 18, 2022. Please check out the parent article for the theoretical background.

Import packages

import Pkg
Pkg.activate(".")
using CSV
using Plots
using HypothesisTests
using DataFrames
  Activating project at `~/sandbox/dataalltheway/posts/011-01-non-parametric-hypothesis-tests-julia`

Data import and cleanup

I have subsetted the data from 1928 onward and dropped any columns with all NAs or zeros. To do so for eachcol of data we first calculate whether all the elements are !ismissing && !=0 (! = not). Then pick all rows for those columns, while disallowmissing data.

temp_file = download("https://zenodo.org/record/7081360/files/1.%20Cement_emissions_data.csv")
data = CSV.read(temp_file, DataFrame)
dropmissing!(data, :Year)
filter!(:Year => >=(1928), data)
picked_cols_mask = eachcol(data) .|> 
    col -> all(x->(!ismissing(x) && x!=0), col)
data = disallowmissing(data[!, picked_cols_mask])
94×24 DataFrame
69 rows omitted
RowYearArgentinaAustraliaBelgiumBrazilCanadaChileChinaDemocratic Republic of the CongoDenmarkEgyptFinlandItalyJapanMozambiqueNorwayPeruPortugalRomaniaSpainSwedenTurkeyUSAGlobal
Int64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
11928116.3378.01505.043.61868.654.5739.7821.81385.243.8138.11519.01897.07.27158.325.4236.34163.6763.2233.229.0715420.035616.0
21929174.4356.21606.047.25963.072.6276.5129.07396.187.22138.11730.02112.010.9159.125.4343.61156.3901.3284.032.7115000.036873.0
31930189.0348.91508.043.58926.779.8873.4532.71385.2148.5101.61723.01853.010.9160.210.947.29196.3908.4304.629.0714290.035561.0
41931265.3196.31218.083.59799.550.8897.9321.81250.8119.979.951519.01788.010.9109.714.5447.2598.14806.8257.950.8310810.030931.0
51932247.1123.61039.072.69363.454.5179.577.27203.6119.976.231545.01843.010.9117.310.958.15105.4705.1241.054.466656.024721.0
61933254.5159.9963.1109.0189.069.05113.23.63272.6141.780.051756.02366.010.18110.814.5379.95109.0694.1200.758.155587.023866.0
71934279.8207.2937.6159.9272.6101.797.933.63381.6145.4112.72013.02304.07.27123.621.8290.94156.3672.3287.183.596900.028542.0
81935356.3276.21087.0181.6272.6141.6156.13.63374.3189.0134.32086.02904.07.27130.829.07105.4189.0668.7367.165.426684.032090.0
91936410.7323.51163.0239.7388.9123.5428.53.63392.4167.2163.51890.03082.07.27149.036.34119.9185.3297.9392.569.059950.038763.0
101937512.3363.41486.0283.5483.4156.3437.73.77334.4163.5203.42155.02984.07.27159.939.96127.1225.3189.0432.5105.410370.040829.0
111938614.3178.11508.0305.5432.5181.79.187.5316.2185.3236.22279.02729.011.99163.650.86130.8221.7294.4490.6130.99239.038551.0
121939556.0338.01261.0345.5450.7167.2223.417.44345.3185.0279.62526.02508.014.54192.658.18145.4261.7588.8585.1141.710790.035687.0
131940534.4348.8105.4367.1592.4189.0272.411.45218.1178.1149.02373.02101.014.54167.261.78134.5196.3770.5345.3130.911550.031431.0
8320104178.03549.02582.021290.06005.01046.0639600.0189.0672.220510.0525.713280.024320.0340.9754.03338.03376.02778.011200.01324.029980.031450.01.2549e6
8420114586.03496.02762.022840.06020.01080.0708600.0175.2861.820100.0557.812580.024980.0373.3749.03300.02813.03089.09523.01361.031450.032210.01.3498e6
8520124184.03518.02643.025000.06532.01128.0714800.0157.9871.120970.0497.210070.025620.0452.7725.03731.02550.03150.08754.01479.031370.035270.01.3846e6
8620134581.03294.02541.026650.05973.01086.0748300.0174.0867.120210.0481.88877.026810.0505.6731.04257.02814.02695.07642.01402.033910.036370.01.4441e6
8720144336.03138.02643.026910.05912.01022.0778600.0127.9887.320760.0468.88339.026560.0585.9727.04590.03096.02944.08897.01399.034500.039440.01.4999e6
8820154571.03076.02348.025080.06185.01033.0722000.0154.6931.521650.0462.18196.025940.0614.2672.04476.02921.03337.09216.01537.034440.039910.01.4444e6
8920164029.02931.02436.022420.06114.01120.0743000.098.041095.022820.0553.27680.025970.0947.8684.04340.02297.03181.09414.01554.037530.039440.01.4876e6
9020174362.03019.02291.019080.06827.0865.9758200.0348.71194.021770.0603.77711.026430.0910.6766.04291.02531.03310.09449.01484.039470.040320.01.5079e6
9120184369.02942.02534.019340.06915.0782.2786700.0406.11160.021000.0601.77757.026180.0930.0730.04320.02251.03505.09667.01607.039410.038970.01.5692e6
9220194141.03040.02819.019860.07125.0825.3826900.0451.01129.019670.0583.57912.025330.01011.0722.04546.02225.03828.09064.01349.032350.040900.01.6175e6
9320203508.02820.02634.022050.06625.0825.3858200.0451.01227.018130.0569.77059.024490.01011.0725.04546.02310.03901.08192.01272.040810.040690.01.6375e6
9420214671.02820.02634.023790.06625.0825.3853000.0451.01227.016160.0569.77059.023790.01011.0701.34546.02310.03901.08609.01272.044390.041200.01.6729e6
plot(
    data[!, :Year], 
    Array(log.(data[!, Not(:Year)])), 
    label=reshape(string.(propertynames(data)[2:end]), 1, :),
    legend= :outerright,
    size=(900, 400)
)

Wilcoxon rank-sum (Mann-Whitney U test)

mwut_results = MannWhitneyUTest(data[!, :USA], data[!, :Canada], )
Approximate Mann-Whitney U test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          28503.5

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-30

Details:
    number of observations in each group: [94, 94]
    Mann-Whitney-U statistic:             8763.0
    rank sums:                            [13228.0, 4538.0]
    adjustment for ties:                  24.0
    normal approximation (μ, σ):          (4345.0, 373.05)

Right tailed test

pvalue(mwut_results, tail=:right)
1.2040605143479147e-31

Wilcoxon signed-rank test

dt = select(filter(:Year=> y-> y==2000 || y==2020, data), Not(:Year))
x = collect(dt[1, :])
y = collect(dt[2, :])
SignedRankTest(x, y)
Exact Wilcoxon signed rank test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          16.0
    95% confidence interval: (-3848.0, 621.5)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.5803

Details:
    number of observations:      23
    Wilcoxon rank-sum statistic: 119.0
    rank sums:                   [119.0, 157.0]
    adjustment for ties:         0.0

Kruskal-Wallis test

KruskalWallisTest(collect(eachcol(data[:, Not(:Year)]))...)
Kruskal-Wallis rank sum test (chi-square approximation)
-------------------------------------------------------
Population details:
    parameter of interest:   Location parameters
    value under h_0:         "all equal"
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-99

Details:
    number of observation in each group: [94, 94, 94, 94, 94, 94, 94, 94, 94, 94  …  94, 94, 94, 94, 94, 94, 94, 94, 94, 94]
    χ²-statistic:                        1253.58
    rank sums:                           [100736.0, 98787.5, 1.11532e5, 1.21338e5, 1.21336e5, 55862.0, 138744.0, 20383.0, 71558.5, 1.03662e5  …  19940.0, 59493.0, 68485.5, 84067.0, 1.0239e5, 132068.0, 83910.0, 1.10562e5, 1.7974e5, 195500.0]
    degrees of freedom:                  22
    adjustment for ties:                 0.999999

Citation

BibTeX citation:
@online{sambrani2022,
  author = {Dhruva Sambrani},
  title = {Non-Parametric Hypothesis Tests with Examples in {Julia}},
  date = {2022-11-30},
  url = {https://www.dataalltheway.com/posts/011-01-non-parametric-hypothesis-tests-julia},
  langid = {en}
}
For attribution, please cite this work as:
Dhruva Sambrani. 2022. “Non-Parametric Hypothesis Tests with Examples in Julia.” November 30, 2022. https://www.dataalltheway.com/posts/011-01-non-parametric-hypothesis-tests-julia.