Kendall Tau Correlation Test
Jump to navigation
Jump to search
A Kendall Tau Correlation Test is a non-parametric correlational hypothesis test that is based on a Kendall's Tau rank correlation statistic.
- AKA: Kendall Correlation Test, Kendall tau Independence Test.
- Context:
- It can be defined as a Statistical Independence Test based on the Spearman Correlation Test.
- It tests the following hypotheses:
- Null hypothesis: There no correlation present in population or data pairs are independent i.e [math]\displaystyle{ H_0:\;\;\tau_s= 0 }[/math]
- Alternative hypothesis: There is correlation present in the population or data pairs are dependent, i.e [math]\displaystyle{ H_A: \;\;\tau_s \ne 0 }[/math]
- where [math]\displaystyle{ -1 \lt \tau_s \lt 1 }[/math] is the Kendall's Tau rank correlation statistic calculated for each observation.
- …
- Counter-Example(s):
- See: Correlational Hypothesis Test, Correlation, Autocorrelation, Cointegration.
References
2017a
- (ITL-SED, 2017) ⇒ Retrieved 2017-01-08 from NIST (National Intitute of Standards and Technology, US) website. http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/kend_tau.htm
- Kendall's tau coefficient is a measure of concordance between two paired variables. Given the pairs [math]\displaystyle{ (X_i,Y_i) }[/math] and [math]\displaystyle{ (Xj,Yj) }[/math], then
- [math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \gt 0 }[/math] - pair is concordant
- [math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \lt 0 }[/math] - pair is discordant
- [math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} = 0 }[/math] - pair is considered a tie
- [math]\displaystyle{ X_i = X_j }[/math] - pair is not compared
- Kendall's tau is computed as
- [math]\displaystyle{ \tau=\frac{N_c−N_d}{N_c+N_d} }[/math]
- with [math]\displaystyle{ N_c }[/math] and [math]\displaystyle{ N_d }[/math] denoting the number of concordant pairs and the number of discordant pairs, respectively, in the sample. Ties add 0.5 to both the concordant and discordant counts. There are [math]\displaystyle{ \binom n 2 }[/math] possible pairs in the bivariate sample.
- A value of +1 indicates that all pairs are concordant, a value of -1 indicates that all pairs are discordant, and a value of 0 indicates no relation (i.e., independence).
- The Kendall tau independence test is a test of whether the Kendall tau coefficient is equal to zero.
- For larger n (e.g., n > 60) or the case where there are many ties, the p-th upper quantile of the Kendall tau statistic can be approximated by
- [math]\displaystyle{ w_p=zp\frac{\sqrt{2(2n+5)}}{3\sqrt{n(n−1)}} }[/math]
- with [math]\displaystyle{ z_p }[/math] and [math]\displaystyle{ n }[/math] denoting the [math]\displaystyle{ p }[/math]-th quantile of the standard normal distribution and the sample size, respectively. The lower quantile is the negative of the upper quantile.
- For a two-sided test, the p-value is computed as twice the minimum of the lower tailed and upper tailed quantiles.
- For [math]\displaystyle{ n \leq 60 }[/math], tabulated quantiles (from Table A11 on pp. 543-544 of Conover) are used. These quantiles are exact when there are no ties in the data.
2017b
- (Wikipedia, 2017) ⇒ http://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
- In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient (after the Greek letter τ), is a statistic used to measure the ordinal association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.
- It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938,[1] though Gustav Fechner had proposed a similar measure in the context of time series in 1897.[2]
- Intuitively, the Kendall correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully different for a correlation of -1) rank between the two variables.
- Both Kendall's [math]\displaystyle{ \tau }[/math] and Spearman's [math]\displaystyle{ \rho }[/math] can be formulated as special cases of a more general correlation coefficient.
2017c
- (CM, 2017) ⇒ http://changingminds.org/explanations/research/analysis/kendall.htm
- The Kendall Tau Rank Correlation Coefficient is used to measure the degree of correspondence between sets of rankings where the measures are not equidistant. It is used with non-parametric data
- The Kendall coefficient is denoted with the Greek letter tau (τ).
- [math]\displaystyle{ \tau = (4P / (n * (n - 1))) - 1 }[/math]
- Where P is the number of concordant pairs and is calculated as the sum over all the items, of items ranked after the given item by both rankings.
- (...) Kendall is used with two ordinal variables or an ordinal and an interval.
- Before computers were commonly available, Spearman correlation was often used as a substitute as it was easier to calculate. Kendall is now often viewed as being a superior metrics.
- The measure is sometimes just referred to as 'Kendall's tau'.
- The Kendall coefficient is denoted with the Greek letter tau (τ).
2017d
- (Quest Software Inc., 2017) ⇒ Statistics – Textbook, Nonparametric Statistics https://documents.software.dell.com/statistics/textbook/nonparametric-statistics#correlations
- Kendall tau is equivalent to Spearman R with regard to the underlying assumptions. It is also comparable in terms of its statistical power. However, Spearman R and Kendall tau are usually not identical in magnitude because their underlying logic as well as their computational formulas are very different. Siegel and Castellan (1988) express the relationship of the two measures in terms of the inequality: More importantly, Kendall tau and Spearman R imply different interpretations: Spearman R can be thought of as the regular Pearson product moment correlation coefficient, that is, in terms of proportion of variability accounted for, except that Spearman R is computed from ranks. Kendall tau, on the other hand, represents a probability, that is, it is the difference between the probability that in the observed data the two variables are in the same order versus the probability that the two variables are in different orders.
2015
- (Scipy.org, 2015) ⇒ The Scipy community, Reference Guide https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.kendalltau.html
- Kendall’s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, values close to -1 indicate strong disagreement. This is the tau-b version of Kendall’s tau which accounts for ties.
- (...)The definition of Kendall’s tau that is used is:
- [math]\displaystyle{ \tau = (P - Q) / \sqrt{((P + Q + T) * (P + Q + U))} }[/math]
- where P is the number of concordant pairs, Q the number of discordant pairs, T the number of ties only in x, and U the number of ties only in y. If a tie occurs for the same pair in both x and y, it is not added to either T or U.
- (...)The definition of Kendall’s tau that is used is:
- ↑ Kendall, M. (1938). "A New Measure of Rank Correlation". Biometrika 30 (1–2): 81–89. doi:10.1093/biomet/30.1-2.81. JSTOR 2332226.
- ↑ Kruskal, W.H. (1958). "Ordinal Measures of Association". Journal of the American Statistical Association 53 (284): 814–861. doi:10.2307/2281954. JSTOR 2281954. MR100941.