Pearson's (r) Product-Moment Correlation Coefficient
(Redirected from Pearson Correlation similarity)
Jump to navigation
Jump to search
A Pearson's (r) Product-Moment Correlation Coefficient is a correlation coefficient defined as [math]\displaystyle{ r =\frac{\sum xy}{\sqrt{\sum x^2 \sum y^2}} }[/math] where [math]\displaystyle{ x=x_i-\bar{x} }[/math] and [math]\displaystyle{ y=y_i-\bar{y} }[/math], [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] are values for observation i, [math]\displaystyle{ \bar{x} }[/math] and [math]\displaystyle{ \bar{y} }[/math] are the respective is the mean values.
- AKA: PPMCC, PCC.
- Context:
- output: Pearson's r Value which ranges from [math]\displaystyle{ -1 }[/math] to [math]\displaystyle{ 1 }[/math], such that:
- [math]\displaystyle{ 1 }[/math] if there is a perfect Linear Relationship between the two variables with Positive Slope (positive correlation).
- [math]\displaystyle{ -1 }[/math] if there is a perfect Linear Relationship between the two variables with Negative Slope (negative correlation).
- [math]\displaystyle{ 0 }[/math] if there is no Linear Relationship between the variables.
- It can range from being a Sample Correlation Coefficient to being a Population Correlation Coefficient.
- …
- output: Pearson's r Value which ranges from [math]\displaystyle{ -1 }[/math] to [math]\displaystyle{ 1 }[/math], such that:
- Example(s):
- Counter-Example(s):
- See: Pearson Correlation Test, Correlation Coefficient, Linear Regression, Co-Occurrence Matrix.
References
2017a
- (CM, 2017) ⇒ http://changingminds.org/explanations/research/analysis/pearson.htm
- QUOTE: Pearson devised a very common way of measuring correlation, often called the Pearson Product-Moment Correlation. It is is used when both variables are at least at interval level and data is parametric.
- It is calculated by dividing the covariance of the two variables by the product of their standard deviations.
- [math]\displaystyle{ r = SUM((x_i - xbar)(y - ybar)) / ((n - 1) * s_x * s_y) }[/math]
- Where x and y are the variables, x_i is a single value of x, xbar is the mean of all x's, n is the number of variables, and sx is the standard deviation of all x's.
- (...) Pearson is a parametric statistic and assumes:
- A normal distribution.
- Interval or ratio data.
- A linear relationship between X and Y
- The coefficient of determination, [math]\displaystyle{ r^2 }[/math], represents the percent of the variance in the dependent variable explained by the dependent variable.
- Correlation explains a certain amount of variance, but not all. This works on a square law, so a correlation of 0.5 indicates that the independent variable explains 25% of the variance of the dependent variable, and a correlation of 0.9 accounts for 81% of the of the variance.
- This means that the unexplained variance is indicated by (1-r2). This i typically due to random factors.
- Pearson's Correlation is also known as the Pearson Product-Moment Correlation or Sample Correlation Coefficient. 'r' is also known as 'Pearson's r'.
- It is calculated by dividing the covariance of the two variables by the product of their standard deviations.
2017b
- (Stattrek,2017) rArr; http://stattrek.com/statistics/correlation.aspx
- QUOTE: The most common formula for computing a product-moment correlation coefficient (r) is given below.
- Product-moment correlation coefficient. The correlation r between two variables is:
- [math]\displaystyle{ r = \sum (xy) / \sqrt{ [ (\sum x^2 ) * (\sum y^2 ) ]} }[/math]
- where Σ is the summation symbol, [math]\displaystyle{ x = x_i - x }[/math], [math]\displaystyle{ x_i }[/math] is the x value for observation i, x is the mean x value, [math]\displaystyle{ y = y_i - y }[/math], [math]\displaystyle{ y_i }[/math] is the y value for observation i, and y is the mean y value.
- The formula below uses population means and population standard deviations to compute a population correlation coefficient (ρ) from population data.
- Population correlation coefficient. The correlation ρ between two variables is:
- [math]\displaystyle{ ρ = [ 1 / N ] * Σ { [ (X_i - μ_X) / σ_x ] * [ (Y_i - μ_Y) / σ_y ] } }[/math]
- where N is the number of observations in the population, Σ is the summation symbol, [math]\displaystyle{ X_i }[/math] is the X value for observation i, μX is the population mean for variable X, [math]\displaystyle{ Y_i }[/math] is the Y value for observation i, [math]\displaystyle{ μ_Y }[/math] is the population mean for variable Y, [math]\displaystyle{ σ_x }[/math] is the population standard deviation of X, and σy is the population standard deviation of Y.
- The formula below uses sample means and sample standard deviations to compute a correlation coefficient (r) from sample data.
- Sample correlation coefficient. The correlation r between two variables is:
- [math]\displaystyle{ r = [ 1 / (n - 1) ] * Σ { [ (x_i - x) / s_x ] * [ (y_i - y) / s_y ] } }[/math]
- where n is the number of observations in the sample, Σ is the summation symbol, x_i is the x value for observation i, x is the sample mean of x, [math]\displaystyle{ y_i }[/math] is the y value for observation i, y is the sample mean of y, [math]\displaystyle{ s_x }[/math] is the sample standard deviation of x, and [math]\displaystyle{ s_y }[/math] is the sample standard deviation of y.
- Each of the latter two formulas can be derived from the first formula. Use the first or second formula when you have data from the entire population. Use the third formula when you only have sample data, but want to estimate the correlation in the population. When in doubt, use the first formula.
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient Retrieved:2015-2-8.
- In statistics, the Pearson product-moment correlation coefficient () (sometimes referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. It is widely used in the sciences as a measure of the degree of linear dependence between two variables. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. [1] [2]
- ↑ See: * As early as 1877, Galton was using the term "reversion" and the symbol “r” for what would become "regression". F. Galton (5, 12, 19 April 1877) "Typical laws of heredity," Nature, 15 (388, 389, 390) : 492–495 ; 512–514 ; 532–533. In the "Appendix" on page 532, Galton uses the term "reversion" and the symbol r. * (F. Galton) (September 24, 1885), "The British Association: Section II, Anthropology: Opening address by Francis Galton, F.R.S., etc., President of the Anthropological Institute, President of the Section," Nature, 32 (830) : 507–510. * Galton, F. (1886) "Regression towards mediocrity in hereditary stature," Journal of the Anthropological Institute of Great Britain and Ireland, 15 : 246–263.
- ↑ Karl Pearson (June 20, 1895) "Notes on regression and inheritance in the case of two parents," Proceedings of the Royal Society of London, 58 : 240–242.