Welch's t-Test System

A Welch's t-Test System is statistical hypothesis testing system that implements a Welch's t-test algorithm to solve an Welch's t-test task.

Context
- It can be based on the implementation of the following algorithms:
  - A Welch's t-test algorithm to calculate the respective Welch's t-test statistic and the respective p-value.
  - A t-distribution calculator, or an alternative algorithm to calculate Probability Density Function and Cumulative Density Function for t-distribution in order to evaluate the null hypothesis and alternative hypothesis.
Example(s):
- Example based on http://libguides.library.kent.edu/SPSS/IndependentTTest examples using sample 'http://libguides.library.kent.edu/ld.php?content_id=11205378'. The following system tests whether the average time to run a mile is different between student athletes (group 1) and non-athletes (group 2).
  - iPython code lines source code:

#importing python libraries

In[1]: import pandas

In[2]: from scipy import stats

In[3]: iimport numpy

#Reading online dataset

In[4]: data = pandas.read_csv('http://libguides.library.kent.edu/ld.php?content_id=11205378', sep=',', na_values=".")

#Defining mile run time for datasets: "Athelete" and "Nonathelete"

In[5]: athelete = data[data['Athlete'] == 1]['MileMinDur']

In[6]: nonathelete = data[data['Athlete'] == 0]['MileMinDur']

# Converting dataset from hh:mm:ss format to a numerical number: running time in minutes

In[7]: athelete=athelete.astype(str).reshape(athelete.size,1)

In[8]: nonathelete=nonathelete.astype(str).reshape(nonathelete.size,1)

In[9]: athelete=athelete[numpy.where(athelete!=[' '])]

In[10]: nonathelete=nonathelete[numpy.where(nonathelete!=[' '])]

In[11]: for i in range(numpy.shape(athelete)[0]) :

...: h,m,s=athelete[i].split(':')

...: athelete[i]=int(h)*60+int(m)+(int(s)/60.)

In[12]: for j in range(numpy.shape(nonathelete)[0]) :

...: h,m,s=nonathelete[j].split(':')

...: nonathelete[j]=int(h)*60+int(m)+(int(s)/60.)

#Defining significance level

In[13]: alpha=0.05

#Performing Levene's Test. This tests whether the populations are equal

In[14]: stats.levene(athelete,nonathelete)

#Output : t-statistic value and p-value

Out[14]: (102.563129443,1.4800514645e-21)

Conclusion: P-value is too small, [math]\displaystyle{ p=1.480\times10^{-21} }[/math], Levene's test rejects the null hypothesis. Population Variances are not equal.

#Performing Welch's Test.

In[15]: stats.ttest_ind(athelete,nonathelete, equal_var = False)

#Output : t-statistic value and p-value

Out[15]: (-15.0486789157 and p-value 5.82457889026e-39)

Conclusion: P-value is too small, [math]\displaystyle{ p=5.82\times10^{-39} }[/math], null hypothesis is rejected. Running time between is very different. Indeed, the difference between mean sample values for 'Atheletes' and 'Nonatheletes' is 2 minutes and 14 seconds.

An online two-samples t-test calculator such as:

Counter-Example(s):
See: Parametric Statistical Test, Computing System, Parameter Optimization System.

References

2017a

(Scipy.org, 2017) ⇒ Retrieved from http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
- scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate' source

Calculates the T-test for the means of two independent samples of scores.

This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

Parameters:	a, b : array_like
	The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
	axis : int or None, optional
	Axis along which to compute test. If None, compute over the whole arrays, a, and b.
	equal_var : bool, optional
	If True (default), perform a standard independent 2 sample test that assumes equal population variances R643. If False, perform Welch’s t-test, which does not assume equal population variance R644.
	New in version 0.11.0.
	nan_policy : {‘propagate’, ‘raise’, ‘omit’}, optional
	Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns:	statistic : float or array
	The calculated t-statistic.
	pvalue : float or array
	The two-tailed p-value.

Notes

We can use this test, if we observe two independent samples from the same or different population, e.g. exam scores of boys and girls or of two ethnic groups. The test measures whether the average (expected) value differs significantly across samples. If we observe a large p-value, for example larger than 0.05 or 0.1, then we cannot reject the null hypothesis of identical average scores. If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%, then we reject the null hypothesis of equal averages.

2017b

(Lowry, 2017) ⇒ Retrived from http://vassarstats.net/tu.html Copyright: Richard Lowry 2001-2017
- t-Test for Independent or Correlated Samples

The logic and computational details of two-sample t-tests are described in Chapters 9-12 of the online text Concepts & Applications of Inferential Statistics. For the independent-samples t-test, this unit will perform both the "usual" t-test, which assumes that the two samples have equal variances, and the alternative t-test, which assumes that the two samples have unequal variances. (A good formulaic summary of the unequal-variances t-test can be found on the StatsDirect web site. A more thorough account appears in the online journal Behavioral Ecology.)

2017c

A t test compares the means of two groups. For example, compare whether systolic blood pressure differs between a control and treated group, between men and women, or any other two groups.

Don't confuse t tests with correlation and regression. The t test compares one variable (perhaps blood pressure) between two groups. Use correlation and regression to see how two variables (perhaps blood pressure and heart rate) vary together. Also don't confuse t tests with ANOVA. The t tests (and related nonparametric tests) compare exactly two groups. ANOVA (and related nonparametric tests) compare three or more groups. Finally, don't confuse a t test with analyses of a contingency table (Fishers or chi-square test). Use a t test to compare a continuous variable (e.g., blood pressure, weight or enzyme activity). Use a contingency table to compare a categorical variable (e.g., pass vs. fail, viable vs. not viable).

2017D

(STHDA, 2017) ⇒ Retrieved from http://www.sthda.com/english/rsthda/unpaired-t-test.php
- Statistical tools for high-throughput data analysis: Student t-test for unpaired samples

2015

(Mangiafico, 2015) ⇒ Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.3.0. , Content retrieved from http://rcompanion.org/rcompanion/d_02.html
- (...) Welch’s t-test is shown above in the “Example” section (“Two sample unpaired t-test”). It is invoked with the var.equal=FALSE option in the t.test function.