Pooled Standard Deviation

From GM-RKB
Jump to navigation Jump to search

A Pooled Standard Deviation is a linear combination between standard deviations of independent samples drawn from populations of unknown but equal variances.

[math]\displaystyle{ s_p =\sqrt{ \frac{(n_1 − 1)s_1^2 + (n_2 − 1)s_2^2 + \cdots + (n_k−1)s_k^2}{n_1 + n_2 + \cdots + n_k − k}} }[/math]
where [math]\displaystyle{ n_1,n_2,\cdots,n_k }[/math] are the respective sample sizes


References

2017

Under the assumption of equal population variances, the pooled sample variance provides a higher precision estimate of variance than the individual sample variances. This higher precision can lead to increased statistical power when used in statistical tests that compare the populations, such as the t-test.
The square-root of a pooled variance estimator is known as a pooled standard deviation (also known as combined, composite, or overall standard deviation).
(...)If the populations are indexed [math]\displaystyle{ i = 1, \ldots, k }[/math], then the pooled variance [math]\displaystyle{ s^2_p }[/math] (or [math]\displaystyle{ s^2_c }[/math] ) can be estimated by the weighted average:
[math]\displaystyle{ s_p^2=\frac{\sum_{i=1}^k (n_i - 1)s_i^2}{\sum_{i=1}^k(n_i - 1)} = \frac{(n_1 - 1)s_1^2+(n_2 - 1)s_2^2+\cdots+(n_k - 1)s_k^2}{n_1+n_2+\cdots+n_k - k} }[/math],
where [math]\displaystyle{ n_i }[/math] is the sample size of population [math]\displaystyle{ i }[/math] and the sample variances are
[math]\displaystyle{ s^2_i }[/math] = [math]\displaystyle{ \frac{1}{n_i-1} \sum_{j=1}^{n_i} \left(y_j - \overline{y_i} \right)^2 }[/math].
Use of [math]\displaystyle{ (n_i-1) }[/math] weighting factors instead of [math]\displaystyle{ n_i }[/math] comes from Bessel's correction.

2014

  • (IUPAC, 2014) ⇒ Retrieved from http://goldbook.iupac.org/html/P/P04758.html published in IUPAC. Compendium of Chemical Terminology, 2nd ed. (the "Gold Book"). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). XML on-line corrected version: http://goldbook.iupac.org (2006-) created by M. Nic, J. Jirat, B. Kosata; updates compiled by A. Jenkins.
    • A problem often arises when the combination of several series of measurements performed under similar conditions is desired to achieve an improved estimate of the imprecision of the process. If it can be assumed that all the series are of the same precision although their means may differ, the pooled standard deviations [math]\displaystyle{ s_p }[/math] from [math]\displaystyle{ k }[/math] series of measurements can be calculated as
[math]\displaystyle{ s_p =\sqrt{ \frac{(n_1 − 1)s_1^2 + (n_2 − 1)s_2^2 + \cdots + (n_k−1)s_k^2}{n_1 + n_2 + \cdots + n_k − k}} }[/math]
The suffices [math]\displaystyle{ 1 , 2 , \cdots , k }[/math] refer to the different series of measurements. In this case it is assumed that there exists a single underlying standard deviation [math]\displaystyle{ \sigma }[/math] of which the pooled standard deviation [math]\displaystyle{ s_p }[/math] is a better estimate than the individual calculated standard deviations [math]\displaystyle{ s_1, s_2, \cdots, s_k }[/math]. For the special case where [math]\displaystyle{ k }[/math] sets of duplicate measurements are available, the above equation reduces to
[math]\displaystyle{ s_p = \sqrt{ (\frac{\sum (x_{i1} − x_{i2})^2}{2k}} }[/math]
Results from various series of measurements can be combined in the following way to give a pooled relative standard deviation [math]\displaystyle{ s_{r,p} }[/math]:
[math]\displaystyle{ s_{r,p} = \sqrt{\frac{\sum (n_i − 1)s_{r,i}^2}{\sum n_i − 1}} = \sqrt{\frac{\sum(n_i − 1)s_{i2}x_i^{−2}}{\sum n_i − 1}} }[/math]

2007

Let [math]\displaystyle{ n_1 }[/math] be the sample size from population 1, [math]\displaystyle{ s_1 }[/math] be the sample standard deviation of population 1.
Let [math]\displaystyle{ n_2 }[/math] be the sample size from population 2, [math]\displaystyle{ s_2 }[/math] be the sample standard deviation of population 2.
Then the common standard deviation can be estimated by the pooled standard deviation:
[math]\displaystyle{ s_p=\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_2+n_1-1}} }[/math]
The test statistic is:
[math]\displaystyle{ t=\frac{\overline{y}_1-\overline{y}_2}{s_p\sqrt{1/n_1+1/n_2}} }[/math]
with degrees of freedom equal to [math]\displaystyle{ df = n_1 + n_2 - 2 }[/math].