Holdout Evaluation

References

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Cross-validation_(statistics)#Holdout_method Retrieved:2018-5-15.
- In the holdout method, we randomly assign data points to two sets d₀ and d₁, usually called the training set and the test set, respectively. The size of each of the sets is arbitrary although typically the test set is smaller than the training set. We then train on d₀ and test on d₁.
  In typical cross-validation, multiple runs are aggregated together; in contrast, the holdout method, in isolation, involves a single run. While the holdout method can be framed as "the simplest kind of cross-validation", many sources instead classify holdout as a type of simple validation, rather than a simple or degenerate form of cross-validation. ^[1] ^[2]

(Sammut & Webb, 2017) ⇒ (2017) "Holdout Evaluation". In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA.
- QUOTE: Holdout evaluation is an approach to out-of-sample evaluation whereby the available data are partitioned into a training set and a test set. The test set is thus out-of-sample data and is sometimes called the holdout set or holdout data. The purpose of holdout evaluation is to test a model on different data to that from which it is learned. This provides less biased estimate of learning performance than in-sample evaluation.
  In repeated holdout evaluation, repeated holdout evaluation experiments are performed, each time with a different partition of the data, to create a distribution of training and test sets with which an algorithm is assessed.

(Schneider, 1997) ⇒ Jeff Schneider (1997). “Cross Validation" In: https://www.cs.cmu.edu/~schneide/tut5/node42.html
- QUOTE: The holdout method is the simplest kind of cross validation. The data set is separated into two sets, called the training set and the testing set. The function approximator fits a function using the training set only. Then the function approximator is asked to predict the output values for the data in the testing set (it has never seen these output values before). The errors it makes are accumulated as before to give the mean absolute test set error, which is used to evaluate the model. The advantage of this method is that it is usually preferable to the residual method and takes no longer to compute. However, its evaluation can have a high variance. The evaluation may depend heavily on which data points end up in the training set and which end up in the test set, and thus the evaluation may be significantly different depending on how the division is made.

↑ Kohavi, Ron. “A study of cross-validation and bootstrap for accuracy estimation and model selection." Ijcai. Vol. 14. No. 2. 1995.
↑ Arlot, Sylvain, and Alain Celisse. “A survey of cross-validation procedures for model selection." Statistics surveys 4 (2010): 40-79. “In brief, CV consists in averaging several hold-out estimators of the risk corresponding to different data splits."