Empirical Risk Minimization Principle
Jump to navigation
Jump to search
See: Generalization Bounds, Gradient Boosted Supervised Learning, Statistical Learning Theory, Expected Risk, Expected Risk Minimization, Empirical Risk.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/empirical_risk_minimization Retrieved:2017-1-21.
- Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on the performance of learning algorithms.
2011
- (Zhang, 2011b) ⇒ Xinhua Zhang. (2011). “Empirical Risk Minimization.” In: (Sammut & Webb, 2011) p.312
- QUOTE: The goal of learning is usually to find a model which delivers good generalization performance over an underlying distribution of the data. Consider an input space X and output space Y. Assume the pairs [math]\displaystyle{ (X \times Y ) \in \mathcal{X}\times \mathcal{Y} }[/math] are random variables whose (unknown) joint distribution is [math]\displaystyle{ P_{XY} }[/math]. It is our goal to find a predictor [math]\displaystyle{ f : \mathcal{X}\mapsto \mathcal{Y} }[/math] which minimizes the expected risk: [math]\displaystyle{ P(\,f(X)\neq Y ) = {\mathbb{E}}_{(X,Y )\sim {P}_{XY }}\left [\delta (\,f(X)\neq Y )\right ], }[/math] where δ(z) = 1 if z is true, and 0 otherwise.
However, in practice we only have n pairs of training examples (Xi, Yi) drawn identically and independently from [math]\displaystyle{ P_{XY} }[/math]. Since [math]\displaystyle{ P_{XY} }[/math] is unknown, we often use the risk on the training set (called empirical risk) as a surrogate of the expected risk on the underlying distribution: …
- QUOTE: The goal of learning is usually to find a model which delivers good generalization performance over an underlying distribution of the data. Consider an input space X and output space Y. Assume the pairs [math]\displaystyle{ (X \times Y ) \in \mathcal{X}\times \mathcal{Y} }[/math] are random variables whose (unknown) joint distribution is [math]\displaystyle{ P_{XY} }[/math]. It is our goal to find a predictor [math]\displaystyle{ f : \mathcal{X}\mapsto \mathcal{Y} }[/math] which minimizes the expected risk: [math]\displaystyle{ P(\,f(X)\neq Y ) = {\mathbb{E}}_{(X,Y )\sim {P}_{XY }}\left [\delta (\,f(X)\neq Y )\right ], }[/math] where δ(z) = 1 if z is true, and 0 otherwise.