Leave-One-Out Error
A Leave-One-Out Error is an error estimate that is obtained by a leave-one-out cross-validation.
- AKA: Hold-one-out Error; LOO Error; Deleted Estimate, U-Method.
- Example(s):
- A leave-one-out cross-validator such as
sklearn.model_selection.LeaveOneOut
- A Leave-One-Out Prediction Algorithm such as:
- CVloo - a Leave-One-Out Cross-Validation Stability Algorithm,
- [math]\displaystyle{ Eloo_{err} }[/math] - an Expected-To-Leave-One-Out Error Stability Algorithm,
- LOOM - a Leave-One-Out Support Vector Machine.
- …
- A leave-one-out cross-validator such as
- Counter-Example(s)
- See: Error Rate, Cross-Validation, K-fold Cross Validation, Leave-one-out Prediction, Accuracy, Confusion Matrix, Holdout Evaluation, Estimator Bias, Machine Learning Algorithm, Statistical Estimator, Empirical Error, Jackknife Algorithm.
References
2018a
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Leave-one-out_error Retrieved:2018-9-16.
- Leave-one-out cross-validation (CVloo) Stability An algorithm f has CVloo stability β with respect to the loss function V if the following holds: [math]\displaystyle{ \forall i\in\{1,...,m\}, \mathbb{P}_S\{\sup_{z\in Z}|V(f_S,z_i)-V(f_{S^{|i}},z_i)|\leq\beta_{CV}\}\geq1-\delta_{CV} }[/math]
Expected-to-leave-one-out error ([math]\displaystyle{ Eloo_{err} }[/math] ) Stability An algorithm f has [math]\displaystyle{ Eloo_{err} }[/math] stability if for each n there exists a [math]\displaystyle{ \beta_{EL}^m }[/math] and a [math]\displaystyle{ \delta_{EL}^m }[/math] such that: [math]\displaystyle{ \forall i\in\{1,...,m\}, \mathbb{P}_S\{|I[f_S]-\frac{1}{m}\sum_{i=1}^m V(f_{S^{|i}},z_i)|\leq\beta_{EL}^m\}\geq1-\delta_{EL}^m }[/math] , with [math]\displaystyle{ \beta_{EL}^m }[/math] and [math]\displaystyle{ \delta_{EL}^m }[/math] going to zero for [math]\displaystyle{ n\rightarrow\inf }[/math]
- Leave-one-out cross-validation (CVloo) Stability An algorithm f has CVloo stability β with respect to the loss function V if the following holds: [math]\displaystyle{ \forall i\in\{1,...,m\}, \mathbb{P}_S\{\sup_{z\in Z}|V(f_S,z_i)-V(f_{S^{|i}},z_i)|\leq\beta_{CV}\}\geq1-\delta_{CV} }[/math]
2018b
- (Scikit-learn, 2018) ⇒ http://scikit-learn.org/stable/modules/cross_validation.html#leave-one-out-loo Retrieved:2018-9-16.
- QUOTE:
LeaveOneOut
(or LOO) is a simple cross-validation. Each learning set is created by taking all the samples except one, the test set being the sample left out. Thus, for n samples, we have n different training sets and n different tests set. This cross-validation procedure does not waste much data as only one sample is removed from the training set:>>> from sklearn.model_selection import LeaveOneOut>>> X = [1, 2, 3, 4]
>>> loo = LeaveOneOut()
>>> for train, test in loo.split(X):
... print("%s %s" % (train, test))
=== 2017a ===
- QUOTE:
- (Sammut & Webb, 2017) ⇒ (2017). "Leave-One-Out Error". In: (Sammut & Webb, 2017).
- QUOTE: Leave-one-out error is an estimate of error obtained by leave-one-out cross-validation.
2017b
- (Erickson, 2017) ⇒ Collin Erickson (2017). Leave-one-out cross-validation and error correction
- QUOTE: Leave-one-out prediction uses an entire model fit to all the data except a single point, and then makes a prediction at that point which can be compared to the actual value. It seems like this may be very expensive to do, but it is actually an inexpensive computation for a Gaussian process model, as long as the same parameters are used from the full model. This will bias the predictions to better results than if parameters were re-estimated.
Normally each prediction point requires solving a matrix equation. To predict the output, [math]\displaystyle{ y }[/math], at point mathbfx, given input data in matrix [math]\displaystyle{ X_2 }[/math] and output [math]\displaystyle{ y_2 }[/math], we use the equation
[math]\displaystyle{ \hat{y}=\hat{\mu}+R(x, X_2)R(X_2)^{−1}(y_2−\mu\;1n)) }[/math]
For leave-one-out predictions, the matrix [math]\displaystyle{ X_2 }[/math] will have all the design points except for the one we are predicting at, and thus will be different for each one. However, we will have the correlation matrix [math]\displaystyle{ R }[/math] for the full data set from estimating the parameters, and there is a shortcut to find the inverse of a matrix leaving out a single row and column …
- QUOTE: Leave-one-out prediction uses an entire model fit to all the data except a single point, and then makes a prediction at that point which can be compared to the actual value. It seems like this may be very expensive to do, but it is actually an inexpensive computation for a Gaussian process model, as long as the same parameters are used from the full model. This will bias the predictions to better results than if parameters were re-estimated.
2004
- (Evgeniou, Pontil & Elisseeff, 2004) ⇒ Theodoros Evgeniou, Massimiliano Pontil, and Andre Elisseeff (2004). "Leave one out error, stability, and generalization of voting combinations of classifiers" (PDF). Machine learning, 55(1), 71-97. DOI: 10.1023/B:MACH.0000019805.88351.60
- QUOTE: If [math]\displaystyle{ \theta }[/math] is, as before, the Heavyside function, then the leave-one-out error of [math]\displaystyle{ f }[/math] on [math]\displaystyle{ D_{\ell} }[/math] is defined by
[math]\displaystyle{ Loo_{D_\ell} (f) = \frac{1}{\ell} \sum_{i=1}^\ell \theta (−y_if_i(x_i))\quad }[/math] (5)
Notice that for simplicity there is a small abuse of notation here, since the leave-one-out error typically refers to a learning method while here we use the solution [math]\displaystyle{ f }[/math] in the notation. The leave-one-out error provides an estimate of the average generalization performance of a machine. It is known that the expectation of the generalization error of a machine trained using [math]\displaystyle{ \ell }[/math] points is equal to the expectation of the Loo error of a machine trained on [math]\displaystyle{ \ell+ 1 }[/math] points.
- QUOTE: If [math]\displaystyle{ \theta }[/math] is, as before, the Heavyside function, then the leave-one-out error of [math]\displaystyle{ f }[/math] on [math]\displaystyle{ D_{\ell} }[/math] is defined by
2003
- (Elisseeff & Pontil, 2003) ⇒ Andre Elisseeff and Massimiliano Pontil (2003). "Leave-one-out error and stability of learning algorithms with applications." NATO science series sub series iii computer and systems sciences 190 (2003): 111-130.
- QUOTE: The leave-one-out error is an important statistical estimator of the performance of a learning algorithm. Unlike the empirical error, it is almost unbiased and is frequently used for model selection(...)
It seems that leave-one-out error (also called deleted estimate or U-method) has been defined in the late sixties/mid-seventies (...)
For a learning algorithm [math]\displaystyle{ A }[/math] producing an outcome [math]\displaystyle{ f_D }[/math], it is define as
[math]\displaystyle{ R_{loo}(f_D)=\frac{1}{m}\sum^{m}_{i=1} \ell(f^i, z_j) }[/math]
and is supposed to be an “almost” unbiased estimate of the generalization error of [math]\displaystyle{ f_D }[/math]. Because it seems to share many properties with a technique called Jackknife introduced by Tukey (1958) and Quenouille (1949), it is worth pointing out that the leave-one-out error is different. The latter concerns indeed a learning algorithm whose output is computed on a point that has not been used during training. The former consists in using repeatedly the whole training set but one point, computing many estimators and combining them at the end. This combination should lead to a new estimator whose bias is supposed to be low. The Jackknife can be used to derive an estimate of the generalization error based on the empirical error but things are then more complicated than for the leave-one-out error estimate …
- QUOTE: The leave-one-out error is an important statistical estimator of the performance of a learning algorithm. Unlike the empirical error, it is almost unbiased and is frequently used for model selection(...)
1999
- (Weston, 1999) ⇒ Jason Weston (1999, July). "Leave-one-out support vector machines" (PDF). In IJCAI (pp. 727-733).
1989
- (Fukunaga & Hummels,1989) ⇒ Keinosuke Fukunaga ; and Donald M. Hummels (1989). "Leave-one-out procedures for nonparametric error estimates" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(4), 421-423. DOI: 10.1109/34.19039
1958
- (Tukey, 1958) ⇒ J.W. Tukey (1958). “Bias and confidence in not-quite large samples". Annals of Math. Stat., 29.
1949
- (Quenouille, 1949) ⇒ Maurice H. Quenouille (1949, July). "Approximate tests of correlation in time-series 3". In Mathematical Proceedings of the Cambridge Philosophical Society (Vol. 45, No. 3, pp. 483-484). Cambridge University Press. DOI:10.1017/S0305004100025123