Generalization Error

A Generalization Error is an error estimate of a Supervised Learning Algorithm that measures how accurate the algorithm can predict the outcome values of an unseen dataset.

AKA: Out-Of-Sample Error.
Context:
- It can also be defined as the difference between the empirical loss on the training data ([math]\displaystyle{ f_a }[/math]) and the expected loss ([math]\displaystyle{ f_b }[/math]) on test data, i.e. [math]\displaystyle{ GE = | f_a -f_b| }[/math]
Example(s):
- A Generalization Error Bound,
- …
Counter-Example(s):
- Leave-One-Out Error.
See: Learning Curve, Supervised Learning, Machine Learning, Statistical Learning Theory, Sampling Error, Overfitting, Algorithm, Generalization, Error, Machine Learning Theory, VC-Dimension, Fat-Shattering Dimension, Rademacher Complexity, Gaussian Complexity.

References

2018a

(Wikipedia, 2018a) ⇒ https://en.wikipedia.org/wiki/Generalization_error Retrieved:2018-9-16.
- In supervised learning applications in machine learning and statistical learning theory, generalization error (also known as the out-of-sample error ^[1] ) is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Because learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about predictive ability on new data. Generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of a machine learning algorithm is measured by plots of the generalization error values through the learning process, which are called learning curves.

↑ Y S. Abu-Mostafa, M.Magdon-Ismail, and H.-T. Lin (2012) Learning from Data, AMLBook Press.

2018b

(Jakubovitz, Giryes, & Rodrigues, 2018) ⇒ Daniel Jakubovitz, Raja Giryes, and Miguel R. D. Rodrigues (2018). "Generalization Error in Deep Learning" (PDF). arXiv preprint arXiv:1808.01174.
- QUOTE: The generalization error of a machine learning model is the difference between the empirical loss on the training set and the expected loss on a test set. In practice, it is therefore measured by the difference between the error on the training data and the one on the test data. This measure represents the ability of the trained model (algorithm) to generalize well from the learning data to new unseen data. It is typically understood that good generalization is obtained when a machine learning model does not memorize the training data, but rather learns some underlying rule associated with the data generation process, thereby being able to extrapolate that rule from the training data to new unseen data and generalize well.
  Therefore, the generalization error of DNNs has been the focus of extensive research, mainly aimed at better understanding the source of their capabilities and deriving key rules and relations between a network’s architecture, the used optimization algorithm for training and the network’s performance on a designated task. Bounds on the generalization error of deep learning models have also been obtained, typically under specific constraints (e.g. a bound for a two-layer neural network with ReLU activations). Recent research also focuses on new techniques for reducing a network’s generalization error, increasing its stability to input data variability or increasing its robustness. The capabilities of deep learning models are often examined under notions of expressivity and capacity: their ability to learn a function of some complexity erom a given set of examples. It has been shown that deep learning models are capable of high expressivity, hence can learn any function under certain architectural constraints. However, the classical measures of machine learning model expressivity (such as VC-dimension, Rademacher complexity, etc.) do not fully explain the abilities of DNNs. Specifically, they do not explain the good generalization behavior achieved by DNNs, which are typically an overparameterized model that often has substantially less training data than model parameters …

2016

(Sokolic et al., 2016) ⇒ Jure Sokolic, Raja Giryes, Guillermo Sapiro, Miguel R. D. Rodrigues (2016). "Generalization error of invariant classifiers" (PDF). arXiv preprint arXiv:1610.04574.
- QUOTE: One of the fundamental topics in statistical learning theory is the one of the generalization error (GE). Given a training set and a hypothesis class, a learning algorithm chooses a hypothesis based on the training set in such a way that it minimizes an empirical loss. This loss, which is calculated on the training set, is also called the training loss and it often underestimates the expected loss. The GE is the difference between the empirical loss and the expected loss. There are various approaches in the literature that aim at bounding the GE via the complexity measures of the hypothesis class such as VC-dimenension (Vapnik, 1999; Vapnik and Chervonenkis, 1991), the fat-shattering dimension (Alon et al., 1997), and the Rademacher and the Gaussian complexities (Bartlett and Mendelson, 2002). Another line of work provides the GE bounds based on the stability of the algorithms, by measuring how sensitive is the output to the removal or change of a single training sample (...)

2005

(Markatou et al., 2005) ⇒ Marianthi Markatou; Hong Tian; Shameek Biswas; and George Hripcsak (2005). "Analysis of variance of cross-validation estimators of the generalization error" (PDF). Journal of Machine Learning Research, 6(Jul), 1127-1168.
- QUOTE: One important aspect of algorithmic performance is the generalization error. Informally, the generalization error is the error an algorithm makes on cases that has never seen before. Thus, the generalization performance of a learning method relates to its prediction capability on the independent test data. The assessment of the performance of learning algorithms is extremely important in practice because it guides the choice of learning methods.
  The generalization error of a learning method can be easily estimated via either cross-validation or bootstrap. However, providing a variance estimate of the estimator of this generalization error is a more difficult problem. This is because the generalization error depends on the loss function involved, and the mathematics needed to analyze the variance of the estimator are complicated.

.

[1] Y S. Abu-Mostafa, M.Magdon-Ismail, and H.-T. Lin (2012) Learning from Data, AMLBook Press.

[1]

Generalization Error

References

2018a

2018b

2016

2005

Navigation menu

Search