Student's t-Test for Correlation
Jump to navigation
Jump to search
A T-test for Correlation is a parametric correlational hypothesis test of continuous variables that is based on a t-test statistic.
- Context:
- It can be described and solved by the following procedure:
- Test Requirements:
- Input Data : Pairs of continuous random variables [math]\displaystyle{ (X_i, Y_i) }[/math] corresponding to the values of a bivariate random sample of size [math]\displaystyle{ n }[/math] (i.e. [math]\displaystyle{ (i=1,...,n) }[/math]. Each pair must be bivariately normally distributed as well as independent and identically distributed.
- Input Parameters: a sample correlation coefficient, [math]\displaystyle{ r= \frac{S_{XY}}{S_X\; S_Y} }[/math] where [math]\displaystyle{ S_X }[/math] and [math]\displaystyle{ S_Y }[/math] are sample standard deviations for [math]\displaystyle{ X }[/math] and for [math]\displaystyle{ Y }[/math] while [math]\displaystyle{ S_{XY} }[/math] is sample covariance between [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math]. A significance level value ([math]\displaystyle{ \alpha_0 }[/math]) to be used in the decision rule approach.
- Hypotheses to be tested:
- [math]\displaystyle{ H_0 :\; \rho=0 }[/math] - null hypothesis states the population correlation coefficient is 0; there is no correlation.
- [math]\displaystyle{ H_A :\; \rho \neq 0 }[/math] - alternative hypothesis states the population correlation coefficient is not 0; there is non-zero correlation.
- Test Method and Sample Data Analysis:
- Test Statistic: t-test statistic for correlation is given by
- [math]\displaystyle{ t_{r,n-1} = \frac{r\sqrt{n-1}}{1-r^2} }[/math], with [math]\displaystyle{ r }[/math] being the sample correlation coefficient.
- Decision Rule: Null hypothesis is reject if P-value is less than [math]\displaystyle{ \alpha_0 }[/math] or if the t-test statistic value follows outside region of acceptance.
- Results and Interpretation:
- Test Requirements:
- Example(s):
- Let's consider the following sample correlation coefficient of two random variables, [math]\displaystyle{ r=0.84 }[/math], sample size [math]\displaystyle{ n=10 }[/math], significance level, [math]\displaystyle{ \alpha_0=0.05 }[/math]. Then,
- the t-test statistic is [math]\displaystyle{ t_{0.84,8}=\frac{0.84\sqrt{8}}{1-0.84^2}=4.378 }[/math],
- the region of acceptance lower and upper limits are defined by the t-distribution values for which [math]\displaystyle{ P(T \leq t)=\alpha/2 }[/math] and [math]\displaystyle{ P(T \leq t)=1-(\alpha/2) }[/math], this is [math]\displaystyle{ -2.306 \lt t \lt +2.306 }[/math],
- the P-value is [math]\displaystyle{ 2 \times P(t_{r,8} \gt 4.378)= 2\times 0.0012=0.0024 }[/math].
- Thus, Null hypothesis is rejected because [math]\displaystyle{ t_{0.84,8} }[/math] falls outside the region of acceptance and p-value less than [math]\displaystyle{ \alpha_0=0.025 }[/math]
- Counter-Example(s)
- See: Correlational Hypothesis Test, Correlation Coefficient, Autocorrelation, Cointegration.
References
2017
- (Stat 415, 2017) ⇒ Intro Mathematical Statistics: Three Tests for Rho https://onlinecourses.science.psu.edu/stat414/node/254
- (...) if (Xi, Yi) follows a bivariate normal distribution, and the conditional mean is a linear function:
- [math]\displaystyle{ E(Y|X=x)=\alpha+\beta\; x }[/math] then: [math]\displaystyle{ \beta =\rho \frac{\sigma_Y}{\sigma_X} }[/math]
- That suggests, therefore, that testing for [math]\displaystyle{ H_0:\rho=0 }[/math] against any of the alternative hypotheses [math]\displaystyle{ H_A:\;\rho \neq 0,\; H_A:\;\rho \gt 0 }[/math] and [math]\displaystyle{ HA:\; \rho \lt 0 }[/math] is equivalent to testing [math]\displaystyle{ H_0:\;\beta=0 }[/math] against the corresponding alternative hypothesis [math]\displaystyle{ H_A:\; \beta \neq 0,\; H_A:\; \rho \lt 0 }[/math] and [math]\displaystyle{ H_A:β\gt 0 }[/math]. That is, we can simply compare the test statistic:
- [math]\displaystyle{ t = \frac{\hat{\beta} - 0}{\sqrt{MSE/\sum(xi−\bar{x})^2}} }[/math]
- to a t-distribution with [math]\displaystyle{ n−2 }[/math] degrees of freedom. It should be noted, though, that the test statistic can be instead written as a function of the sample correlation coefficient:
- [math]\displaystyle{ R = \frac{\frac{1}{1-n}\sum_{i=1}^n[X_i-\bar{X}][Y_i-\bar{X}]}{\sqrt{\frac{1}{1-n}\sum_{i=1}^n[X_i-\bar{X}]^2}\sqrt{\frac{1}{1-n}\sum_{i=1}^n[Y_i-\bar{Y}]^2}} =\frac{S_{XY}}{S_X\;S_Y} }[/math]
- That is, the test statistic can be alternatively written as:
- [math]\displaystyle{ t = \frac{r\sqrt{n-1}}{1 -r^2} }[/math]
- and because of its algebraic equivalence to the first test statistic, it too follows a t distribution with n−2 degrees of freedom (...)
- (1) [math]\displaystyle{ \hat{\beta}= \frac{\frac{1}{1-n}\sum_{i=1}^n[X_i-\bar{X}][Y_i-\bar{X}]}{\frac{1}{1-n}\sum_{i=1}^n[X_i-\bar{X}]^2} =R\frac{S_Y}{S_X} }[/math]
- (2) [math]\displaystyle{ MSE=\frac{\frac{1}{1-n}\sum_{i=1}^n[Y_i-\bar{Y}]^2}{n-2}=\frac{(n−1)S_Y^2(1−R^2)}{n−2} }[/math]