import Pkg
Pkg.activate(".")
using CSV
using Plots
using HypothesisTests
using DataFrames
Activating project at `~/sandbox/dataalltheway/posts/011-01-non-parametric-hypothesis-tests-julia`
2022-11-30 First draft
This article is an extension of Farmer. 2022. “Non-Parametric Hypothesis Tests with Examples in R.” November 18, 2022. Please check out the parent article for the theoretical background.
I have subsetted the data from 1928 onward and dropped any columns with all NAs or zeros. To do so for eachcol
of data
we first calculate whether all
the elements are !ismissing
&&
!=0
(!
= not
). Then pick all rows for those columns, while disallowmissing
data.
temp_file = download("https://zenodo.org/record/7081360/files/1.%20Cement_emissions_data.csv")
data = CSV.read(temp_file, DataFrame)
dropmissing!(data, :Year)
filter!(:Year => >=(1928), data)
picked_cols_mask = eachcol(data) .|>
col -> all(x->(!ismissing(x) && x!=0), col)
data = disallowmissing(data[!, picked_cols_mask])
Row | Year | Argentina | Australia | Belgium | Brazil | Canada | Chile | China | Democratic Republic of the Congo | Denmark | Egypt | Finland | Italy | Japan | Mozambique | Norway | Peru | Portugal | Romania | Spain | Sweden | Turkey | USA | Global |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 1928 | 116.3 | 378.0 | 1505.0 | 43.61 | 868.6 | 54.57 | 39.78 | 21.81 | 385.2 | 43.8 | 138.1 | 1519.0 | 1897.0 | 7.27 | 158.3 | 25.42 | 36.34 | 163.6 | 763.2 | 233.2 | 29.07 | 15420.0 | 35616.0 |
2 | 1929 | 174.4 | 356.2 | 1606.0 | 47.25 | 963.0 | 72.62 | 76.51 | 29.07 | 396.1 | 87.22 | 138.1 | 1730.0 | 2112.0 | 10.9 | 159.1 | 25.43 | 43.61 | 156.3 | 901.3 | 284.0 | 32.71 | 15000.0 | 36873.0 |
3 | 1930 | 189.0 | 348.9 | 1508.0 | 43.58 | 926.7 | 79.88 | 73.45 | 32.71 | 385.2 | 148.5 | 101.6 | 1723.0 | 1853.0 | 10.9 | 160.2 | 10.9 | 47.29 | 196.3 | 908.4 | 304.6 | 29.07 | 14290.0 | 35561.0 |
4 | 1931 | 265.3 | 196.3 | 1218.0 | 83.59 | 799.5 | 50.88 | 97.93 | 21.81 | 250.8 | 119.9 | 79.95 | 1519.0 | 1788.0 | 10.9 | 109.7 | 14.54 | 47.25 | 98.14 | 806.8 | 257.9 | 50.83 | 10810.0 | 30931.0 |
5 | 1932 | 247.1 | 123.6 | 1039.0 | 72.69 | 363.4 | 54.51 | 79.57 | 7.27 | 203.6 | 119.9 | 76.23 | 1545.0 | 1843.0 | 10.9 | 117.3 | 10.9 | 58.15 | 105.4 | 705.1 | 241.0 | 54.46 | 6656.0 | 24721.0 |
6 | 1933 | 254.5 | 159.9 | 963.1 | 109.0 | 189.0 | 69.05 | 113.2 | 3.63 | 272.6 | 141.7 | 80.05 | 1756.0 | 2366.0 | 10.18 | 110.8 | 14.53 | 79.95 | 109.0 | 694.1 | 200.7 | 58.15 | 5587.0 | 23866.0 |
7 | 1934 | 279.8 | 207.2 | 937.6 | 159.9 | 272.6 | 101.7 | 97.93 | 3.63 | 381.6 | 145.4 | 112.7 | 2013.0 | 2304.0 | 7.27 | 123.6 | 21.82 | 90.94 | 156.3 | 672.3 | 287.1 | 83.59 | 6900.0 | 28542.0 |
8 | 1935 | 356.3 | 276.2 | 1087.0 | 181.6 | 272.6 | 141.6 | 156.1 | 3.63 | 374.3 | 189.0 | 134.3 | 2086.0 | 2904.0 | 7.27 | 130.8 | 29.07 | 105.4 | 189.0 | 668.7 | 367.1 | 65.42 | 6684.0 | 32090.0 |
9 | 1936 | 410.7 | 323.5 | 1163.0 | 239.7 | 388.9 | 123.5 | 428.5 | 3.63 | 392.4 | 167.2 | 163.5 | 1890.0 | 3082.0 | 7.27 | 149.0 | 36.34 | 119.9 | 185.3 | 297.9 | 392.5 | 69.05 | 9950.0 | 38763.0 |
10 | 1937 | 512.3 | 363.4 | 1486.0 | 283.5 | 483.4 | 156.3 | 437.7 | 3.77 | 334.4 | 163.5 | 203.4 | 2155.0 | 2984.0 | 7.27 | 159.9 | 39.96 | 127.1 | 225.3 | 189.0 | 432.5 | 105.4 | 10370.0 | 40829.0 |
11 | 1938 | 614.3 | 178.1 | 1508.0 | 305.5 | 432.5 | 181.7 | 9.18 | 7.5 | 316.2 | 185.3 | 236.2 | 2279.0 | 2729.0 | 11.99 | 163.6 | 50.86 | 130.8 | 221.7 | 294.4 | 490.6 | 130.9 | 9239.0 | 38551.0 |
12 | 1939 | 556.0 | 338.0 | 1261.0 | 345.5 | 450.7 | 167.2 | 223.4 | 17.44 | 345.3 | 185.0 | 279.6 | 2526.0 | 2508.0 | 14.54 | 192.6 | 58.18 | 145.4 | 261.7 | 588.8 | 585.1 | 141.7 | 10790.0 | 35687.0 |
13 | 1940 | 534.4 | 348.8 | 105.4 | 367.1 | 592.4 | 189.0 | 272.4 | 11.45 | 218.1 | 178.1 | 149.0 | 2373.0 | 2101.0 | 14.54 | 167.2 | 61.78 | 134.5 | 196.3 | 770.5 | 345.3 | 130.9 | 11550.0 | 31431.0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
83 | 2010 | 4178.0 | 3549.0 | 2582.0 | 21290.0 | 6005.0 | 1046.0 | 639600.0 | 189.0 | 672.2 | 20510.0 | 525.7 | 13280.0 | 24320.0 | 340.9 | 754.0 | 3338.0 | 3376.0 | 2778.0 | 11200.0 | 1324.0 | 29980.0 | 31450.0 | 1.2549e6 |
84 | 2011 | 4586.0 | 3496.0 | 2762.0 | 22840.0 | 6020.0 | 1080.0 | 708600.0 | 175.2 | 861.8 | 20100.0 | 557.8 | 12580.0 | 24980.0 | 373.3 | 749.0 | 3300.0 | 2813.0 | 3089.0 | 9523.0 | 1361.0 | 31450.0 | 32210.0 | 1.3498e6 |
85 | 2012 | 4184.0 | 3518.0 | 2643.0 | 25000.0 | 6532.0 | 1128.0 | 714800.0 | 157.9 | 871.1 | 20970.0 | 497.2 | 10070.0 | 25620.0 | 452.7 | 725.0 | 3731.0 | 2550.0 | 3150.0 | 8754.0 | 1479.0 | 31370.0 | 35270.0 | 1.3846e6 |
86 | 2013 | 4581.0 | 3294.0 | 2541.0 | 26650.0 | 5973.0 | 1086.0 | 748300.0 | 174.0 | 867.1 | 20210.0 | 481.8 | 8877.0 | 26810.0 | 505.6 | 731.0 | 4257.0 | 2814.0 | 2695.0 | 7642.0 | 1402.0 | 33910.0 | 36370.0 | 1.4441e6 |
87 | 2014 | 4336.0 | 3138.0 | 2643.0 | 26910.0 | 5912.0 | 1022.0 | 778600.0 | 127.9 | 887.3 | 20760.0 | 468.8 | 8339.0 | 26560.0 | 585.9 | 727.0 | 4590.0 | 3096.0 | 2944.0 | 8897.0 | 1399.0 | 34500.0 | 39440.0 | 1.4999e6 |
88 | 2015 | 4571.0 | 3076.0 | 2348.0 | 25080.0 | 6185.0 | 1033.0 | 722000.0 | 154.6 | 931.5 | 21650.0 | 462.1 | 8196.0 | 25940.0 | 614.2 | 672.0 | 4476.0 | 2921.0 | 3337.0 | 9216.0 | 1537.0 | 34440.0 | 39910.0 | 1.4444e6 |
89 | 2016 | 4029.0 | 2931.0 | 2436.0 | 22420.0 | 6114.0 | 1120.0 | 743000.0 | 98.04 | 1095.0 | 22820.0 | 553.2 | 7680.0 | 25970.0 | 947.8 | 684.0 | 4340.0 | 2297.0 | 3181.0 | 9414.0 | 1554.0 | 37530.0 | 39440.0 | 1.4876e6 |
90 | 2017 | 4362.0 | 3019.0 | 2291.0 | 19080.0 | 6827.0 | 865.9 | 758200.0 | 348.7 | 1194.0 | 21770.0 | 603.7 | 7711.0 | 26430.0 | 910.6 | 766.0 | 4291.0 | 2531.0 | 3310.0 | 9449.0 | 1484.0 | 39470.0 | 40320.0 | 1.5079e6 |
91 | 2018 | 4369.0 | 2942.0 | 2534.0 | 19340.0 | 6915.0 | 782.2 | 786700.0 | 406.1 | 1160.0 | 21000.0 | 601.7 | 7757.0 | 26180.0 | 930.0 | 730.0 | 4320.0 | 2251.0 | 3505.0 | 9667.0 | 1607.0 | 39410.0 | 38970.0 | 1.5692e6 |
92 | 2019 | 4141.0 | 3040.0 | 2819.0 | 19860.0 | 7125.0 | 825.3 | 826900.0 | 451.0 | 1129.0 | 19670.0 | 583.5 | 7912.0 | 25330.0 | 1011.0 | 722.0 | 4546.0 | 2225.0 | 3828.0 | 9064.0 | 1349.0 | 32350.0 | 40900.0 | 1.6175e6 |
93 | 2020 | 3508.0 | 2820.0 | 2634.0 | 22050.0 | 6625.0 | 825.3 | 858200.0 | 451.0 | 1227.0 | 18130.0 | 569.7 | 7059.0 | 24490.0 | 1011.0 | 725.0 | 4546.0 | 2310.0 | 3901.0 | 8192.0 | 1272.0 | 40810.0 | 40690.0 | 1.6375e6 |
94 | 2021 | 4671.0 | 2820.0 | 2634.0 | 23790.0 | 6625.0 | 825.3 | 853000.0 | 451.0 | 1227.0 | 16160.0 | 569.7 | 7059.0 | 23790.0 | 1011.0 | 701.3 | 4546.0 | 2310.0 | 3901.0 | 8609.0 | 1272.0 | 44390.0 | 41200.0 | 1.6729e6 |
Approximate Mann-Whitney U test
-------------------------------
Population details:
parameter of interest: Location parameter (pseudomedian)
value under h_0: 0
point estimate: 28503.5
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: <1e-30
Details:
number of observations in each group: [94, 94]
Mann-Whitney-U statistic: 8763.0
rank sums: [13228.0, 4538.0]
adjustment for ties: 24.0
normal approximation (μ, σ): (4345.0, 373.05)
dt = select(filter(:Year=> y-> y==2000 || y==2020, data), Not(:Year))
x = collect(dt[1, :])
y = collect(dt[2, :])
SignedRankTest(x, y)
Exact Wilcoxon signed rank test
-------------------------------
Population details:
parameter of interest: Location parameter (pseudomedian)
value under h_0: 0
point estimate: 16.0
95% confidence interval: (-3848.0, 621.5)
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.5803
Details:
number of observations: 23
Wilcoxon rank-sum statistic: 119.0
rank sums: [119.0, 157.0]
adjustment for ties: 0.0
Kruskal-Wallis rank sum test (chi-square approximation)
-------------------------------------------------------
Population details:
parameter of interest: Location parameters
value under h_0: "all equal"
point estimate: NaN
Test summary:
outcome with 95% confidence: reject h_0
one-sided p-value: <1e-99
Details:
number of observation in each group: [94, 94, 94, 94, 94, 94, 94, 94, 94, 94 … 94, 94, 94, 94, 94, 94, 94, 94, 94, 94]
χ²-statistic: 1253.58
rank sums: [100736.0, 98787.5, 1.11532e5, 1.21338e5, 1.21336e5, 55862.0, 138744.0, 20383.0, 71558.5, 1.03662e5 … 19940.0, 59493.0, 68485.5, 84067.0, 1.0239e5, 132068.0, 83910.0, 1.10562e5, 1.7974e5, 195500.0]
degrees of freedom: 22
adjustment for ties: 0.999999
@online{sambrani2022,
author = {Sambrani, Dhruva},
title = {Non-Parametric Hypothesis Tests with Examples in {Julia}},
date = {2022-11-30},
url = {https://dataalltheway.com/posts/011-01-non-parametric-hypothesis-tests-julia},
langid = {en}
}