One-Sample t-Test System
Jump to navigation
Jump to search
An One-Sample t-Test System is statistical hypothesis testing system that implements a one-sample t-test algorithm to solve an one-sample t-test task.
- Context
- It can be based on the implementation of the following algorithms:
- An One-Sample t-Test Algorithm to calculate one-sample t-test statistic and the respective p-value.
- A t-distribution calculator, or an alternative algorithm to calculate Probability Density Function and Cumulative Density Function for t-distribution in order to evaluate the null hypothesis and alternative hypothesis.
- It can be based on the implementation of the following algorithms:
- Example(s):
- Example based on http://www.scipy-lectures.org/packages/statistics/index.html#student-s-t-test-the-simplest-statistical-test using dataset http://www.scipy-lectures.org/_downloads/brain_size.csv and the iPython command lines source code:
- #importing python libraries
- In[1]:
import pandas
- In[2]:
from scipy import stats
- # reading sample data
- In[3]:
data = pandas.read_csv('http://www.scipy-lectures.org/_downloads/brain_size.csv', sep=';', na_values=".")
- # Performing One-sample t-test
- # Null hypothesis VIQ population mean value is 10
- In[4]:
stats.ttest_1samp(data['VIQ'], 10)
- #Output : t-statistic value and p-value
- Out[4]:
27.4100314376, 4.29314923813e-27
- Conclusion: p-value is too small, null hypothesis is rejected. VIQ population mean value is NOT 10
- #Null hypothesis VIQ population mean value is 115
- In[5]:
stats.ttest_1samp(data['VIQ'], 115)
- #Output : t-statistic value and p-value
- Out[5]:
-0.709688161306, 0.482119348971
- Conclusion: P-value is greater than the significance levels = 0.05, 0.025,0.01 , test fails to reject the null hypothesis. VIQ population mean value can be 115.
- #Null hypothesis VIQ population mean value is 200
- In[6]:
stats.ttest_1samp(data['VIQ'], 200)
- #Output : t-statistic value and p-value
- Out[6]:
(-23.4732706938, 1.2940564282e-24)
- Conclusion: p-value is too small, null hypothesis is rejected. VIQ population mean value is NOT 200
- Example based on http://hamelg.blogspot.ca/2015/11/python-for-data-analysis-part-24.html :
- #importing python libraries
- In[1]:
import numpy as np
- In[2]:
import pandas as pd
- In[3]:
import scipy.stats as stats
- #creating a random artificial datasets
- In[4]:
np.random.seed(6)
- In[5]:
population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
- In[6]:
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
- In[7]:
minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
- In[8]:
minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
- #population dataset
- In[9]:
population_ages = np.concatenate((population_ages1, population_ages2))
- #sample dataset
- In[10]:
minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2)
- #calling one-sample t-test function
- In[11]:
stats.ttest_1samp(a=minnesota_ages,popmean=population_ages.mean())
- #Output : t-statistic value and p-value
- Out[11]:
-2.5742714883655027, 0.013118685425061678
- For a significance level [math]\displaystyle{ \alpha=0.05 }[/math] the null hypothesis is rejected
- # calculation of acceptance region lower limit for significance level alpha=0.05 using probability density function for the t-distribution (stats.t.pdf). Note that q=alpha/2 and df is degrees of freedom
- # In[12]:
stats.t.ppf(q=0.025, df=49)
- Out[12]:
-2.0095752344892093
- # calculation of acceptance region upper limit q=1- (alpha/2)
- In[13]:
stats.t.ppf(q=0.975, df=49)
- Out[13]:
2.0095752344892088
- null hypothesis is rejected because t-statistic falls outside the acceptance region.
- # Alternative method of calculating p-value using cumulative distribution function for the t-distribution (stats.t.cdf). Note that x= t-statistic value.
- In[14]:
stats.t.cdf(x= -2.5742, df= 49) * 2
- Out[14]:
0.013121066545690117
- null hypothesis is rejected, p-value is less than significance level.
- An online one-sample t-test calculator such as:
- Counter-Example(s):
- See: Parametric Statistical Test, Computing System, Parameter Optimization System.
References
2017a
- (Scipy docs, 2017) ⇒ https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html
- scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate')
- Calculates the T-test for the mean of ONE group of scores.
- This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.
2017b
- (Varoquaux, 2017) ⇒ Retrieved on 2017-02-16 from "Statistics in Python" http://www.scipy-lectures.org/packages/statistics/index.html#student-s-t-test-the-simplest-statistical-test
- QUOTE: scipy.stats.ttest_1samp() tests if the population mean of data is likely to be equal to a given value (technically if observations are drawn from a Gaussian distributions of given population mean). It returns the T statistic, and the p-value (see the function’s help):
>>>
>>> stats.ttest_1samp(data['VIQ'], 0)
(...30.088099970..., 1.32891964...e-28)
- With a p-value of [math]\displaystyle{ 10^{-28} }[/math] we can claim that the population mean for the IQ (VIQ measure) is not 0.
2015
- (Hamelg, 2015) ⇒ Retrieved on 2017-02-26 from "Python for Data Analysis Part 24: Hypothesis Testing and the T-Test", http://hamelg.blogspot.ca/2015/11/python-for-data-analysis-part-24.html
- A one-sample t-test checks whether a sample mean differs from the population mean. Let's create some dummy age data for the population of voters in the entire country and a sample of voters in Minnesota and test the whether the average age of voters Minnesota differs from the population
import numpy as np import pandas as pd import scipy.stats as stats import matplotlib.pyplot as plt import math np.random.seed(6) population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000) population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000) population_ages = np.concatenate((population_ages1, population_ages2)) minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30) minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20) minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2)) print( population_ages.mean() ) print( minnesota_ages.mean() )
- Notice that we used a slightly different combination of distributions to generate the sample data for Minnesota, so we know that the two means are different. Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis that the sample comes from the same distribution as the population. To conduct a one sample t-test, we can the stats.ttest_1samp() function:
stats.ttest_1samp(a= minnesota_ages, popmean= population_ages.mean()) # (Sample data, Pop mean)