Kendall Tau Correlation Test

From GM-RKB
Jump to navigation Jump to search

A Kendall Tau Correlation Test is a non-parametric correlational hypothesis test that is based on a Kendall's Tau rank correlation statistic.

where [math]\displaystyle{ -1 \lt \tau_s \lt 1 }[/math] is the Kendall's Tau rank correlation statistic calculated for each observation.


References

2017a

[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \gt 0 }[/math] - pair is concordant
[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \lt 0 }[/math] - pair is discordant
[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} = 0 }[/math] - pair is considered a tie
[math]\displaystyle{ X_i = X_j }[/math] - pair is not compared
Kendall's tau is computed as
[math]\displaystyle{ \tau=\frac{N_c−N_d}{N_c+N_d} }[/math]
with [math]\displaystyle{ N_c }[/math] and [math]\displaystyle{ N_d }[/math] denoting the number of concordant pairs and the number of discordant pairs, respectively, in the sample. Ties add 0.5 to both the concordant and discordant counts. There are [math]\displaystyle{ \binom n 2 }[/math] possible pairs in the bivariate sample.
A value of +1 indicates that all pairs are concordant, a value of -1 indicates that all pairs are discordant, and a value of 0 indicates no relation (i.e., independence).
The Kendall tau independence test is a test of whether the Kendall tau coefficient is equal to zero.
For larger n (e.g., n > 60) or the case where there are many ties, the p-th upper quantile of the Kendall tau statistic can be approximated by
[math]\displaystyle{ w_p=zp\frac{\sqrt{2(2n+5)}}{3\sqrt{n(n−1)}} }[/math]
with [math]\displaystyle{ z_p }[/math] and [math]\displaystyle{ n }[/math] denoting the [math]\displaystyle{ p }[/math]-th quantile of the standard normal distribution and the sample size, respectively. The lower quantile is the negative of the upper quantile.
For a two-sided test, the p-value is computed as twice the minimum of the lower tailed and upper tailed quantiles.
For [math]\displaystyle{ n \leq 60 }[/math], tabulated quantiles (from Table A11 on pp. 543-544 of Conover) are used. These quantiles are exact when there are no ties in the data.

2017b

It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938,[1] though Gustav Fechner had proposed a similar measure in the context of time series in 1897.[2]
Intuitively, the Kendall correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully different for a correlation of -1) rank between the two variables.
Both Kendall's [math]\displaystyle{ \tau }[/math] and Spearman's [math]\displaystyle{ \rho }[/math] can be formulated as special cases of a more general correlation coefficient.

2017c

The Kendall coefficient is denoted with the Greek letter tau (τ).
[math]\displaystyle{ \tau = (4P / (n * (n - 1))) - 1 }[/math]
Where P is the number of concordant pairs and is calculated as the sum over all the items, of items ranked after the given item by both rankings.
(...) Kendall is used with two ordinal variables or an ordinal and an interval.
Before computers were commonly available, Spearman correlation was often used as a substitute as it was easier to calculate. Kendall is now often viewed as being a superior metrics.
The measure is sometimes just referred to as 'Kendall's tau'.

2017d

  • (Quest Software Inc., 2017) ⇒ Statistics – Textbook, Nonparametric Statistics https://documents.software.dell.com/statistics/textbook/nonparametric-statistics#correlations
    • Kendall tau is equivalent to Spearman R with regard to the underlying assumptions. It is also comparable in terms of its statistical power. However, Spearman R and Kendall tau are usually not identical in magnitude because their underlying logic as well as their computational formulas are very different. Siegel and Castellan (1988) express the relationship of the two measures in terms of the inequality: More importantly, Kendall tau and Spearman R imply different interpretations: Spearman R can be thought of as the regular Pearson product moment correlation coefficient, that is, in terms of proportion of variability accounted for, except that Spearman R is computed from ranks. Kendall tau, on the other hand, represents a probability, that is, it is the difference between the probability that in the observed data the two variables are in the same order versus the probability that the two variables are in different orders.

2015

(...)The definition of Kendall’s tau that is used is:
[math]\displaystyle{ \tau = (P - Q) / \sqrt{((P + Q + T) * (P + Q + U))} }[/math]
where P is the number of concordant pairs, Q the number of discordant pairs, T the number of ties only in x, and U the number of ties only in y. If a tie occurs for the same pair in both x and y, it is not added to either T or U.

  1. Kendall, M. (1938). "A New Measure of Rank Correlation". Biometrika 30 (1–2): 81–89. doi:10.1093/biomet/30.1-2.81. JSTOR 2332226. 
  2. Kruskal, W.H. (1958). "Ordinal Measures of Association". Journal of the American Statistical Association 53 (284): 814–861. doi:10.2307/2281954. JSTOR 2281954. MR100941.