Pearson Correlation Test System
Jump to navigation
Jump to search
An Pearson Correlation Test System is statistical hypothesis testing system that solves a Pearson correlation test task.
- Context
- It can be based on the implementation of a Pearson Correlation Algorithm to calculate Pearson product-moment correlation coefficient and the respective p-value.
- Example(s):
- An implementation python subroutine scipy.stats.mstats.pearsonr(x, y)
- Example1: Testing whether female and male datasets in http://www.scipy-lectures.org/_downloads/brain_size.csv
- An implementation python subroutine scipy.stats.mstats.pearsonr(x, y)
- #importing python libraries
import pandas
from scipy.stats import pearsonr
- #reading data file
data = pandas.read_csv('brain_size.csv', sep=';', na_values=".")
- #female dataset
female_viq = data[data['Gender'] == 'Female']['VIQ']
- #female dataset
male_viq = data[data['Gender'] == 'Male']['VIQ']
- #calling pearsonr function
pearsonr(female_viq,male_viq)
- # output : Pearson’s correlation coefficient, 2-tailed p-value
(0.0082168169434572707, 0.97257333753162245)
- there is no linear relationship between the two datasets.
- Example 2:
from scipy.stats import pearsonr
pearsonr([1,2,3,4,5,6],[2,3,4,5,6,7])
- # output : Pearson’s correlation coefficient, 2-tailed p-value
(1.0, 0.0)
- the two datasets are correlated
- Counter-Example(s):
- See: Parametric Statistical Test, Computing System, Parameter Optimization System.
References
2017
- (Scipy docs, 2017) ⇒ https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.pearsonr.html scipy.stats.mstats.pearsonr(x, y) - Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
- The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
- The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.