Pearson Correlation Test Task
Jump to navigation
Jump to search
A Pearson Correlation Test Task is a Correlational Hypothesis Testing Task that performs a Pearson Correlation Test using a Pearson Correlation Test System.
- Context
- Input:
- Input Data : Pairs of continuous random variables [math]\displaystyle{ (X_i, Y_i) }[/math] corresponding to the values of a bivariate random sample of size [math]\displaystyle{ n }[/math] (i.e. [math]\displaystyle{ (i=1,...,n) }[/math]. Each pair must be bivariately normally distributed as well as independent and identically distributed.
- Input Parameters:
- [math]\displaystyle{ \mu_X }[/math] and [math]\displaystyle{ \mu_Y }[/math] population means for variable [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math];
- [math]\displaystyle{ \sigma_X }[/math], [math]\displaystyle{ \sigma_Y }[/math] population variances for [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math]
- [math]\displaystyle{ \sigma_{XY} }[/math], the population covariance between [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math];
- Task Output:
- [math]\displaystyle{ S_X }[/math] and [math]\displaystyle{ S_Y }[/math] sample standard deviations for [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
- [math]\displaystyle{ r }[/math], Pearson product-moment correlation coefficient ;
- Classification of the correlation's strength which can be:
- [math]\displaystyle{ 0.1 \lt | r | \leq 0.3 }[/math] - small or weak correlation.
- [math]\displaystyle{ 0.3 \lt | r | \leq 0.5 }[/math] - medium or moderate correlation
- [math]\displaystyle{ 0.5 \lt | r | \leq 1 }[/math] - large or strong correlation
- Task Requirements:
- Hypotheses Statement:
- [math]\displaystyle{ H_0 :\; \rho=0 }[/math] - null hypothesis states the population correlation coefficient is 0; there is no correlation.
- [math]\displaystyle{ H_A :\; \rho \neq 0 }[/math] - alternative hypothesis states the population correlation coefficient is not 0; there is non-zero correlation.
- Test Statistic: Pearson product-moment correlation coefficient ([math]\displaystyle{ r }[/math]) between the pair of continuous random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
- Decision Rule: The sign and magnitude of the Pearson product-moment correlation coefficient ([math]\displaystyle{ r }[/math]) indicate the direction strength of the correlation between [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math]. The correlation coefficient [math]\displaystyle{ r }[/math] varies between -1 and 1: (a) if [math]\displaystyle{ r=0 }[/math], the random variables are not correlated, (b) [math]\displaystyle{ r=\pm 1 }[/math], the random variables are perfectly correlated.
- Hypotheses Statement:
- …
- Input:
- Counter-Example(s):
- See: Correlational Hypothesis Test, Correlation Coefficient, Autocorrelation, Cointegration.
References
2017a
- (SPSS Tutorials,2017) ⇒ http://libguides.library.kent.edu/SPSS/PearsonCorr
- The bivariate Pearson Correlation produces a sample correlation coefficient, r, which measures the strength and direction of linear relationships between pairs of continuous variables. By extension, the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among the same pairs of variables in the population, represented by a population correlation coefficient, ρ (“rho”). The Pearson Correlation is a parametric measure.
2017b
- (CM, 2017) ⇒ http://changingminds.org/explanations/research/analysis/pearson.htm
- Pearson devised a very common way of measuring correlation, often called the Pearson Product-Moment Correlation. It is is used when both variables are at least at interval level and data is parametric.
- It is calculated by dividing the covariance of the two variables by the product of their standard deviations.
- [math]\displaystyle{ r = SUM((x_i - xbar)(y - ybar)) / ((n - 1) * s_x * s_y) }[/math]
- Where x and y are the variables, x_i is a single value of x, xbar is the mean of all x's, n is the number of variables, and sx is the standard deviation of all x's.
- r may also be considered as being:
- [math]\displaystyle{ r^2 = explained\; variation / total \; variation }[/math]
- where variation is calculated as the Sum of the Squares, SS
- In other words, it is the proportion of variation that can be explained. A high explained proportion is good, and a value of one is perfect correlation. For example an r of 0.8 explains 64% of the variance.
- When calculated from a population, Pearson's coefficient is denoted with the Greek letter 'rho' (ρ). When calculated from a sample, it is denoted with 'r'.
- The Coefficient of Determination is calculated as [math]\displaystyle{ r^2 }[/math].
- (...) Pearson is a parametric statistic and assumes:
- A normal distribution.
- Interval or ratio data.
- A linear relationship between X and Y
- The coefficient of determination, [math]\displaystyle{ r^2 }[/math], represents the percent of the variance in the dependent variable explained by the dependent variable.
- Correlation explains a certain amount of variance, but not all. This works on a square law, so a correlation of 0.5 indicates that the independent variable explains 25% of the variance of the dependent variable, and a correlation of 0.9 accounts for 81% of the of the variance.
- This means that the unexplained variance is indicated by (1-r2). This i typically due to random factors.
- Pearson's Correlation is also known as the Pearson Product-Moment Correlation or Sample Correlation Coefficient. 'r' is also known as 'Pearson's r'.
- It is calculated by dividing the covariance of the two variables by the product of their standard deviations.
2017c
- (Stattrek,2017) ⇒ http://stattrek.com/statistics/correlation.aspx
- The most common formula for computing a product-moment correlation coefficient (r) is given below.
- Product-moment correlation coefficient. The correlation r between two variables is:
- [math]\displaystyle{ r = \sum (xy) / \sqrt{ [ (\sum x^2 ) * (\sum y^2 ) ]} }[/math]
- where Σ is the summation symbol, [math]\displaystyle{ x = x_i - x }[/math], [math]\displaystyle{ x_i }[/math] is the x value for observation i, x is the mean x value, [math]\displaystyle{ y = y_i - y }[/math], [math]\displaystyle{ y_i }[/math] is the y value for observation i, and y is the mean y value.
- The formula below uses population means and population standard deviations to compute a population correlation coefficient (ρ) from population data.
- Population correlation coefficient. The correlation ρ between two variables is:
- [math]\displaystyle{ ρ = [ 1 / N ] * Σ { [ (X_i - μ_X) / σ_x ] * [ (Y_i - μ_Y) / σ_y ] } }[/math]
- where N is the number of observations in the population, Σ is the summation symbol, [math]\displaystyle{ X_i }[/math] is the X value for observation i, μX is the population mean for variable X, [math]\displaystyle{ Y_i }[/math] is the Y value for observation i, [math]\displaystyle{ μ_Y }[/math] is the population mean for variable Y, [math]\displaystyle{ σ_x }[/math] is the population standard deviation of X, and σy is the population standard deviation of Y.
- The formula below uses sample means and sample standard deviations to compute a correlation coefficient (r) from sample data.
- Sample correlation coefficient. The correlation r between two variables is:
- [math]\displaystyle{ r = [ 1 / (n - 1) ] * Σ { [ (x_i - x) / s_x ] * [ (y_i - y) / s_y ] } }[/math]
- where n is the number of observations in the sample, Σ is the summation symbol, x_i is the x value for observation i, x is the sample mean of x, [math]\displaystyle{ y_i }[/math] is the y value for observation i, y is the sample mean of y, [math]\displaystyle{ s_x }[/math] is the sample standard deviation of x, and [math]\displaystyle{ s_y }[/math] is the sample standard deviation of y.
- Each of the latter two formulas can be derived from the first formula. Use the first or second formula when you have data from the entire population. Use the third formula when you only have sample data, but want to estimate the correlation in the population. When in doubt, use the first formula.
2017d
- (Stat 509, 2017) ⇒ Design and Analysis of Clinical Trials, The Pennsylvania State University 18.1 - Pearson Correlation Coefficient https://onlinecourses.science.psu.edu/stat509/node/156
- The sample Pearson correlation coefficient (also called the sample product-moment correlation coefficient) for measuring the association between variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is given by the following formula:
- [math]\displaystyle{ r_p=\frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}} }[/math]
- The sample Pearson correlation coefficient, [math]\displaystyle{ r_p }[/math] , is the point estimate of the population Pearson correlation coefficient
- [math]\displaystyle{ \rho_p=\frac{\sigma_{XY}}{\sqrt{\sigma_{XX}\sigma_{YY}}} }[/math]
- The Pearson correlation coefficient measures the degree of linear relationship between [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ -1 ≤ r_p ≤ +1 }[/math], so that [math]\displaystyle{ r_p }[/math] is a "unitless" quantity, i.e., when you construct the correlation coefficient the units of measurement that are used cancel out. A value of +1 reflects perfect positive correlation and a value of -1 reflects perfect negative correlation.
2016
- (Wikipedia, 2016) ⇒ http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
- In statistics, the Pearson product-moment correlation coefficient () (sometimes referred to as the PPMCC or PCC or Pearson's r) is a measure of the [[linear
correlation]] (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. It is widely used in the sciences as a measure of the degree of linear dependence between two variables. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. [1] [2]
- ↑ See: * As early as 1877, Galton was using the term "reversion" and the symbol “r” for what would become "regression". F. Galton (5, 12, 19 April 1877) "Typical laws of heredity," Nature, 15 (388, 389, 390) : 492–495 ; 512–514 ; 532–533. In the "Appendix" on page 532, Galton uses the term "reversion" and the symbol r. * (F. Galton) (September 24, 1885), "The British Association: Section II, Anthropology: Opening address by Francis Galton, F.R.S., etc., President of the Anthropological Institute, President of the Section," Nature, 32 (830) : 507–510. * Galton, F. (1886) "Regression towards mediocrity in hereditary stature," Journal of the Anthropological Institute of Great Britain and Ireland, 15 : 246–263.
- ↑ Karl Pearson (June 20, 1895) "Notes on regression and inheritance in the case of two parents," Proceedings of the Royal Society of London, 58 : 240–242.