Sample Variance Function
A Sample Variance Function is a Variance Function that is a Sample Statistic Function (describes the dispersion of a Numeric Sample Space around a Sample Mean based on a Random Sample).
- AKA: Empirical Variance Statistic.
- Context:
- Input: a Numeric Multiset/Numeric Sample Space.
- Output: a Real Number.
- It can range from being a Biased Sample Variance Function to being an Unbiased Sample Variance Function.
- Examples(s):
- Var({3, 7, 7, 19}) ⇒ 36.
- Income Variance Measure.
- Counter-Example(s):
- See: Probability Theory.
References
2013
- (Wikipedia, 2013) ⇒ http://en.wikipedia.org/wiki/Variance#Sample_variance
- In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a sample of the population.[1] Sample variance can also be applied to the estimation of the variance of a continuous distribution from a sample of that distribution.
We take a sample with replacement of n values y1, ..., yn from the population, where n < N, and estimate the variance on the basis of this sample.[2] Directly taking the variance of the sample gives: :[math]\displaystyle{ \sigma_y^2 = \frac 1n \sum_{i=1}^n \left(y_i - \overline{y} \right)^2 }[/math] Here, [math]\displaystyle{ \overline{y} }[/math] denotes the sample mean:
- In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a sample of the population.[1] Sample variance can also be applied to the estimation of the variance of a continuous distribution from a sample of that distribution.
- [math]\displaystyle{ \overline{y}=\frac 1n \sum_{i=1}^n y_i . }[/math] Since the yi are selected randomly, both [math]\displaystyle{ \scriptstyle\overline{y} }[/math] and [math]\displaystyle{ \scriptstyle\sigma_y^2 }[/math] are random variables. Their expected values can be evaluated by summing over the ensemble of all possible samples {yi} from the population. For [math]\displaystyle{ \scriptstyle\sigma_y^2 }[/math] this gives: :[math]\displaystyle{
\begin{align}
E[\sigma_y^2]
& = E\left[ \frac 1n \sum_{i=1}^n \left(y_i - \frac 1n \sum_{j=1}^n y_j \right)^2 \right] \\
& = \frac 1n \sum_{i=1}^n E\left[ y_i^2 - \frac 2n y_i \sum_{j=1}^n y_j + \frac{1}{n^2} \sum_{j=1}^n y_j \sum_{k=1}^n y_k \right] \\
& = \frac 1n \sum_{i=1}^n \left[ \frac{n-2}{n} E[y_i^2] - \frac 2n \sum_{j \neq i} E[y_i y_j] + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j} E[y_j y_k] +\frac{1}{n^2} \sum_{j=1}^n E[y_j^2] \right] \\
& = \frac 1n \sum_{i=1}^n \left[ \frac{n-2}{n} (\sigma^2+\mu^2) - \frac 2n (n-1) \mu^2 + \frac{1}{n^2} n (n-1) \mu^2 + \frac 1n (\sigma^2+\mu^2) \right] \\
& = \frac{n-1}{n} \sigma^2.
\end{align}
}[/math] Hence [math]\displaystyle{ \scriptstyle\sigma_y^2 }[/math] gives an estimate of the population variance that is biased by a factor of (n-1)/n. For this reason, [math]\displaystyle{ \scriptstyle\sigma_y^2 }[/math] is referred to as the biased sample variance. Correcting for this bias yields the unbiased sample variance: :[math]\displaystyle{ s^2 = \frac{1}{n-1} \sum_{i=1}^n \left(y_i - \overline{y} \right)^2 }[/math] Either estimator may be simply referred to as the sample variance when the version can be determined by context. The same proof is also applicable for samples taken from a continuous probability distribution.
The use of the term n − 1 is called Bessel's correction, and it is also used in sample covariance and the sample standard deviation (the square root of variance). The square root is a concave function and thus introduces negative bias (by Jensen's inequality), which depends on the distribution, and thus the corrected sample standard deviation (using Bessel's correction) is biased. The unbiased estimation of standard deviation is a technically involved problem, though for the normal distribution using the term n − 1.5 yields an almost unbiased estimator.
The unbiased sample variance is a U-statistic for the function ƒ(y1, y2) = (y1 − y2)2/2, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.