Pearson Correlation Test System

An Pearson Correlation Test System is statistical hypothesis testing system that solves a Pearson correlation test task.

Context
- It can be based on the implementation of a Pearson Correlation Algorithm to calculate Pearson product-moment correlation coefficient and the respective p-value.
Example(s):
- An implementation python subroutine scipy.stats.mstats.pearsonr(x, y)
  - Example1: Testing whether female and male datasets in http://www.scipy-lectures.org/_downloads/brain_size.csv

#importing python libraries

import pandas

from scipy.stats import pearsonr

#reading data file

data = pandas.read_csv('brain_size.csv', sep=';', na_values=".")

#female dataset

female_viq = data[data['Gender'] == 'Female']['VIQ']

#female dataset

male_viq = data[data['Gender'] == 'Male']['VIQ']

#calling pearsonr function

pearsonr(female_viq,male_viq)

# output : Pearson’s correlation coefficient, 2-tailed p-value

(0.0082168169434572707, 0.97257333753162245)

there is no linear relationship between the two datasets.

Example 2:

from scipy.stats import pearsonr

pearsonr([1,2,3,4,5,6],[2,3,4,5,6,7])

# output : Pearson’s correlation coefficient, 2-tailed p-value

(1.0, 0.0)

the two datasets are correlated

Counter-Example(s):
See: Parametric Statistical Test, Computing System, Parameter Optimization System.

References

2017

(Scipy docs, 2017) ⇒ https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.pearsonr.html scipy.stats.mstats.pearsonr(x, y) - Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.