One-Sample t-Test System

An One-Sample t-Test System is statistical hypothesis testing system that implements a one-sample t-test algorithm to solve an one-sample t-test task.

Context
- It can be based on the implementation of the following algorithms:
  - An One-Sample t-Test Algorithm to calculate one-sample t-test statistic and the respective p-value.
  - A t-distribution calculator, or an alternative algorithm to calculate Probability Density Function and Cumulative Density Function for t-distribution in order to evaluate the null hypothesis and alternative hypothesis.
Example(s):
- Example based on http://www.scipy-lectures.org/packages/statistics/index.html#student-s-t-test-the-simplest-statistical-test using dataset http://www.scipy-lectures.org/_downloads/brain_size.csv and the iPython command lines source code:

#importing python libraries

In[1]: import pandas

In[2]: from scipy import stats

# reading sample data

In[3]: data = pandas.read_csv('http://www.scipy-lectures.org/_downloads/brain_size.csv', sep=';', na_values=".")

# Performing One-sample t-test

# Null hypothesis VIQ population mean value is 10

In[4]: stats.ttest_1samp(data['VIQ'], 10)

#Output : t-statistic value and p-value

Out[4]: 27.4100314376, 4.29314923813e-27

Conclusion: p-value is too small, null hypothesis is rejected. VIQ population mean value is NOT 10

#Null hypothesis VIQ population mean value is 115

In[5]: stats.ttest_1samp(data['VIQ'], 115)

#Output : t-statistic value and p-value

Out[5]: -0.709688161306, 0.482119348971

Conclusion: P-value is greater than the significance levels = 0.05, 0.025,0.01 , test fails to reject the null hypothesis. VIQ population mean value can be 115.

#Null hypothesis VIQ population mean value is 200

In[6]: stats.ttest_1samp(data['VIQ'], 200)

#Output : t-statistic value and p-value

Out[6]: (-23.4732706938, 1.2940564282e-24)

Conclusion: p-value is too small, null hypothesis is rejected. VIQ population mean value is NOT 200

Example based on http://hamelg.blogspot.ca/2015/11/python-for-data-analysis-part-24.html :

#importing python libraries

In[1]: import numpy as np

In[2]: import pandas as pd

In[3]: import scipy.stats as stats

#creating a random artificial datasets

In[4]: np.random.seed(6)

In[5]: population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)

In[6]: population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)

In[7]: minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)

In[8]: minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)

#population dataset

In[9]: population_ages = np.concatenate((population_ages1, population_ages2))

#sample dataset

In[10]: minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2)

#calling one-sample t-test function

In[11]: stats.ttest_1samp(a=minnesota_ages,popmean=population_ages.mean())

#Output : t-statistic value and p-value

Out[11]: -2.5742714883655027, 0.013118685425061678

For a significance level [math]\displaystyle{ \alpha=0.05 }[/math] the null hypothesis is rejected

# calculation of acceptance region lower limit for significance level alpha=0.05 using probability density function for the t-distribution (stats.t.pdf). Note that q=alpha/2 and df is degrees of freedom

# In[12]: stats.t.ppf(q=0.025, df=49)

Out[12]: -2.0095752344892093

# calculation of acceptance region upper limit q=1- (alpha/2)

In[13]: stats.t.ppf(q=0.975, df=49)

Out[13]: 2.0095752344892088

null hypothesis is rejected because t-statistic falls outside the acceptance region.

# Alternative method of calculating p-value using cumulative distribution function for the t-distribution (stats.t.cdf). Note that x= t-statistic value.

In[14]: stats.t.cdf(x= -2.5742, df= 49) * 2

Out[14]: 0.013121066545690117

null hypothesis is rejected, p-value is less than significance level.

An online one-sample t-test calculator such as:

Counter-Example(s):
See: Parametric Statistical Test, Computing System, Parameter Optimization System.

References

2017a

(Scipy docs, 2017) ⇒ https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html
- scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate')

Calculates the T-test for the mean of ONE group of scores.

This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.

2017b

(Varoquaux, 2017) ⇒ Retrieved on 2017-02-16 from "Statistics in Python" http://www.scipy-lectures.org/packages/statistics/index.html#student-s-t-test-the-simplest-statistical-test
- QUOTE: scipy.stats.ttest_1samp() tests if the population mean of data is likely to be equal to a given value (technically if observations are drawn from a Gaussian distributions of given population mean). It returns the T statistic, and the p-value (see the function’s help):

>>>

>>> stats.ttest_1samp(data['VIQ'], 0)

(...30.088099970..., 1.32891964...e-28)

With a p-value of [math]\displaystyle{ 10^{-28} }[/math] we can claim that the population mean for the IQ (VIQ measure) is not 0.

2015

(Hamelg, 2015) ⇒ Retrieved on 2017-02-26 from "Python for Data Analysis Part 24: Hypothesis Testing and the T-Test", http://hamelg.blogspot.ca/2015/11/python-for-data-analysis-part-24.html
- A one-sample t-test checks whether a sample mean differs from the population mean. Let's create some dummy age data for the population of voters in the entire country and a sample of voters in Minnesota and test the whether the average age of voters Minnesota differs from the population

  import numpy as np
  import pandas as pd
  import scipy.stats as stats
  import matplotlib.pyplot as plt
  import math
  np.random.seed(6)
  population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
  population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
  population_ages = np.concatenate((population_ages1, population_ages2))
  minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
  minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
  minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))
  print( population_ages.mean() )
  print( minnesota_ages.mean() )

Notice that we used a slightly different combination of distributions to generate the sample data for Minnesota, so we know that the two means are different. Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis that the sample comes from the same distribution as the population. To conduct a one sample t-test, we can the stats.ttest_1samp() function:

 stats.ttest_1samp(a= minnesota_ages, popmean= population_ages.mean())
 # (Sample data, Pop mean)