Jackknifing Algorithm
A Jackknifing Algorithm is a resampling algorithm that is used to estimate the bias of an estimator.
- …
- Counter-Example(s):
- See: Cross-Validation, Outlier Detection, Cumulative Density Function Estimation, Jackknife Regression Algorithm.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife Retrieved:2015-1-21.
- Jackknifing, which is similar to bootstrapping, is used in statistical inference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. Historically this method preceded the invention of the bootstrap with Quenouille inventing this method in 1949 and Tukey extending it in 1958.[1][2] This method was foreshadowed by Mahalanobis who in 1946 suggested repeated estimates of the statistic of interest with half the sample chosen at random.[3] He coined the name 'interpenetrating samples' for this method.
Quenouille invented this method with the intention of reducing the bias of the sample estimate. Tukey extended this method by assuming that if the replicates could be considered identically and independently distributed, then an estimate of the variance of the sample parameter could be made and that it would be approximately distributed as a t variate with n - 1 degrees of freedom (n being the sample size).
The basic idea behind the jackknife variance estimator lies in systematically recomputing the statistic estimate, leaving out one or more observations at a time from the sample set. From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated.
Instead of using the jackknife to estimate the variance, it may instead be applied to the log of the variance. This transformation may result in better estimates particularly when the distribution of the variance itself may be non normal.
For many statistical parameters the jackknife estimate of variance tends asymptotically to the true value almost surely. In technical terms one says that the jackknife estimate is consistent. The jackknife is consistent for the sample means, sample variances, central and non-central t-statistics (with possibly non-normal populations), sample coefficient of variation, maximum likelihood estimators, least squares estimators, correlation coefficients and regression coefficients.
It is not consistent for the sample median. In the case of a unimodal variate the ratio of the jackknife variance to the sample variance tends to be distributed as one half the square of a chi square distribution with two degrees of freedom.
The jackknife, like the original bootstrap, is dependent on the independence of the data. Extensions of the jackknife to allow for dependence in the data have been proposed.
Another extension is the delete-a-group method used in association with Poisson sampling.
- Jackknifing, which is similar to bootstrapping, is used in statistical inference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. Historically this method preceded the invention of the bootstrap with Quenouille inventing this method in 1949 and Tukey extending it in 1958.[1][2] This method was foreshadowed by Mahalanobis who in 1946 suggested repeated estimates of the statistic of interest with half the sample chosen at random.[3] He coined the name 'interpenetrating samples' for this method.
- ↑ Quenouille M (1949) Approximate tests of correlation in time series. J Roy Stat Soc Series B 11: 68-84
- ↑ Tukey JW (1958) Bias and confidence in not quite large samples (abstract). Ann Math Stats 29: 614
- ↑ Mahalanobis PC (1946). Recent experiments in statistical sampling in the Indian Statistical Institute. J Roy Stat Soc 109: 325-370
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife Retrieved:2015-1-21.
- Both methods, the bootstrap and the jackknife, estimate the variability of a statistic from the variability of that statistic between subsamples, rather than from parametric assumptions. For the more general jackknife, the delete-m observations jackknife, the bootstrap can be seen as a random approximation of it. Both yield similar numerical results, which is why each can be seen as approximation to the other. Although there are huge theoretical differences in their mathematical insights, the main practical difference for statistics users is that the bootstrap gives different results when repeated on the same data, whereas the jackknife gives exactly the same result each time. Because of this, the jackknife is popular when the estimates need to be verified several times before publishing (e.g., official statistics agencies). On the other hand, when this verification feature is not crucial and it is of interest not to have a number but just an idea of its distribution, the bootstrap is preferred (e.g., studies in physics, economics, biological sciences).
Whether to use the bootstrap or the jackknife may depend more on operational aspects than on statistical concerns of a survey. The jackknife, originally used for bias reduction, is more of a specialized method and only estimates the variance of the point estimator. This can be enough for basic statistical inference (e.g., hypothesis testing, confidence intervals). The bootstrap, on the other hand, first estimates the whole distribution (of the point estimator) and then computes the variance from that. While powerful and easy, this can become highly computer intensive.
"The bootstrap can be applied to both variance and distribution estimation problems. However, the bootstrap variance estimator is not as good as the jackknife or the balanced repeated replication (BRR) variance estimator in terms of the empirical results. Furthermore, the bootstrap variance estimator usually requires more computations than the jackknife or the BRR. Thus, the bootstrap is mainly recommended for distribution estimation." [1]
There is a special consideration with the jackknife, particularly with the delete-1 observation jackknife. It should only be used with smooth, differentiable statistics (e.g., totals, means, proportions, ratios, odd ratios, regression coefficients, etc.; not with medians or quantiles). This may become a practical disadvantage (or not, depending on the needs of the user). This disadvantage is usually the argument favoring bootstrapping over jackknifing. More general jackknifes than the delete-1, such as the delete-m jackknife, overcome this problem for the medians and quantiles by relaxing the smoothness requirements for consistent variance estimation.
Usually the jackknife is easier to apply to complex sampling schemes than the bootstrap. Complex sampling schemes may involve stratification, multiple stages (clustering), varying sampling weights (non-response adjustments, calibration, post-stratification) and under unequal-probability sampling designs. Theoretical aspects of both the bootstrap and the jackknife can be found in Shao and Tu (1995),[2] whereas a basic introduction is accounted in Wolter (2007).[3]
- Both methods, the bootstrap and the jackknife, estimate the variability of a statistic from the variability of that statistic between subsamples, rather than from parametric assumptions. For the more general jackknife, the delete-m observations jackknife, the bootstrap can be seen as a random approximation of it. Both yield similar numerical results, which is why each can be seen as approximation to the other. Although there are huge theoretical differences in their mathematical insights, the main practical difference for statistics users is that the bootstrap gives different results when repeated on the same data, whereas the jackknife gives exactly the same result each time. Because of this, the jackknife is popular when the estimates need to be verified several times before publishing (e.g., official statistics agencies). On the other hand, when this verification feature is not crucial and it is of interest not to have a number but just an idea of its distribution, the bootstrap is preferred (e.g., studies in physics, economics, biological sciences).
2010
- http://www.physics.utah.edu/~detar/phycs6730/handouts/jackknife/jackknife/
- QUOTE: … It provides an alternative and reasonably robust method for determining the propagation of error from the data to the parameters.
Starting from a sample of $N$ measurements, the jackknife begins by throwing out the first measurement, leaving a jackknife data set of $N-1$ “resampled values. The statistical analysis is done on the reduced sample, giving a measured value of a parameter, say $m_{J1}$. Then a new resampling is done, this time throwing out the second measurement, and a new measured value of the parameter is obtained, say $m_{J2}$. The process is repeated for each set $i$ in the sample, resulting in a set of parameter values $\{m_{Ji},i=1,\ldots{},N\}$. The standard error is given by the formula : [math]\displaystyle{ \sigma^2_{\rm Jmean} = (N-1)\sum_{i=1}^N (m_{Ji} - m)^2/N \ (1) }[/math] where $m$ is the result of fitting the full sample.
The jackknife method is also capable of giving an estimate of sampling bias.
- QUOTE: … It provides an alternative and reasonably robust method for determining the propagation of error from the data to the parameters.
2008
- (Upton & Cook, 2008) ⇒ Graham Upton, and Ian Cook. (2008). “A Dictionary of Statistics, 2nd edition revised." Oxford University Press. ISBN:0199541450
- QUOTE: Jackknife: A computer-intensive resampling method for estimating some unknown parameter of a distribution while making minimal assumptions. In this respect it resembles the bootstrap. Denote the parameter by [math]\displaystyle{ \theta }[/math] and its usual estimate, based on a sample of [math]\displaystyle{ n }[/math] observations by [math]\displaystyle{ \hat \theta }[/math]. For example, if the parameter were the mean of a distribution then the usual estimate would be the sample mean.
The jackknife procedure produces an alternative estimate [math]\displaystyle{ \tilde \theta }[/math], together with an estimate of the *bias (if any) of the usual estimate. Let [math]\displaystyle{ \hat \theta_j }[/math] be the usual estimate of [math]\displaystyle{ \theta }[/math] calculated from the same sample but with the jth observation omitted. Now define the pseudovalue [math]\displaystyle{ \tilde \theta_j }[/math] by :[math]\displaystyle{ \tilde \theta_j = n \hat \theta - (n-1)\hat\theta_{-j} }[/math] The jackknife mean [math]\displaystyle{ \bar \theta }[/math] and variance [math]\displaystyle{ s^2 }[/math] are given by :[math]\displaystyle{ \tilde \theta = \frac{1}{n} \sum_j \tilde \theta_j }[/math] :[math]\displaystyle{ S^2 = \frac{1}{n-1}\sum_j (\tilde \theta_j - \tilde \theta)^2 }[/math] The estimated bias [math]\displaystyle{ \bar \theta - \theta }[/math] The ratio :[math]\displaystyle{ \hat \theta - \theta \over {s/\sqrt{n} } }[/math] has an approximate standard normal distribution. The method also applies to the estimation of more complex characteristics, such as the correlation in a set of bivariate observations. The term jackknife was coined by Tukey in the early 1960s.
- QUOTE: Jackknife: A computer-intensive resampling method for estimating some unknown parameter of a distribution while making minimal assumptions. In this respect it resembles the bootstrap. Denote the parameter by [math]\displaystyle{ \theta }[/math] and its usual estimate, based on a sample of [math]\displaystyle{ n }[/math] observations by [math]\displaystyle{ \hat \theta }[/math]. For example, if the parameter were the mean of a distribution then the usual estimate would be the sample mean.
2006
- (Wasserman, 2006c) ⇒ Larry Wasserman. (2006). “Chapter 3 - The Bootstrap and the Jackknife.” In: (Wasserman, 2006) doi:10.1007/0-387-30623-4_3
1995
- (Shao & Tu, 1995) ⇒ Jun Shao and Dongsheng Tu. (1995). “The Jackknife and Bootstrap." Springer-Verlag. ISBN:0387945156
- BOOK OVERVIEW: The jackknife and bootstrap are the most popular data-resampling methods used in statistical analysis. The resampling methods replace theoretical derivations required in applying traditional methods (such as substitution and linearization) in statistical analysis by repeatedly resampling the original data and making inferences from the resamples. Because of the availability of inexpensive and fast computing, these computer-intensive methods have caught on very rapidly in recent years and are particularly appreciated by applied statisticians. The primary aims of this book are (1) to provide a systematic introduction to the theory of the jackknife, the bootstrap, and other resampling methods developed in the last twenty years; (2) to provide a guide for applied statisticians: practitioners often use (or misuse) the resampling methods in situations where no theoretical confirmation has been made; and (3) to stimulate the use of the jackknife and bootstrap and further devel opments of the resampling methods. The theoretical properties of the jackknife and bootstrap methods are studied in this book in an asymptotic framework. Theorems are illustrated by examples. Finite sample properties of the jackknife and bootstrap are mostly investigated by examples and/or empirical simulation studies. In addition to the theory for the jackknife and bootstrap methods in problems with independent and identically distributed (Li.d.) data, we try to cover, as much as we can, the applications of the jackknife and bootstrap in various complicated non-Li.d. data problems.
1979
- (Efron, 1979) ⇒ Bradley Efron. (1979). “Bootstrap Methods: Another Look at the Jackknife.” In: The Annals of Statistics, 7(1). http://www.jstor.org/stable/2958830
- QUOTE: The Quenouille — Tukey jackknife is an intriguing non-parametric method for estimating the bias and variance of a statistic of interest, and also for testing the null hypothesis that the distribution of a statistic is centered at some pre-specified point. …