Expected Error
An Expected Error is an estimation of the expected value of a loss function.
- AKA: Expected Risk.
- Context:
- It can be defined as,
- [math]\displaystyle{ I[f_n] = \int_{X \times Y} V(f_n(x),y) \rho(x,y) dx dy, }[/math]
where [math]\displaystyle{ V }[/math] is the loss function and [math]\displaystyle{ \rho(x,y) }[/math] is an unknown joint probability distribution, for a given function [math]\displaystyle{ f(x) }[/math] that predicts output values [math]\displaystyle{ y }[/math] based on some input data [math]\displaystyle{ x }[/math].
* Example(s)
- [math]\displaystyle{ I[f_n] = \int_{X \times Y} V(f_n(x),y) \rho(x,y) dx dy, }[/math]
- Generalization Error.
- …
- It can be defined as,
- Counter-Example(s):
- See: Expected Loss Function, Expected Error Rate, Expected Error Minimization.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Generalization_error#Definition Retrieved:2017-7-16.
- In a learning problem, the goal is to develop a function [math]\displaystyle{ f(x) }[/math] that predicts output values [math]\displaystyle{ y }[/math] based on some input data [math]\displaystyle{ x }[/math] . The expected error, [math]\displaystyle{ I[f_n] }[/math] of a particular function [math]\displaystyle{ f_n }[/math] over all possible values of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] is: : [math]\displaystyle{ I[f_n] = \int_{X \times Y} V(f_n(x),y) \rho(x,y) dx dy, }[/math] where [math]\displaystyle{ V }[/math] denotes a loss function and [math]\displaystyle{ \rho(x,y) }[/math] is the unknown joint probability distribution for [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] .
Without knowing the joint probability distribution, it is impossible to compute [math]\displaystyle{ I[f] }[/math] . Instead, we can compute the empirical error on sample data. Given [math]\displaystyle{ n }[/math] data points, the empirical error is: : [math]\displaystyle{ I_S[f_n] = \frac{1}{n} \sum_{i=1}^n V(f_n(x_i),y_i) }[/math] The generalization error is the difference between the expected and empirical error. This is the difference between error on the training set and error on the underlying joint probability distribution. It is defined as: : [math]\displaystyle{ G = I[f_n] - I_S[f_n] }[/math] An algorithm is said to generalize if: : [math]\displaystyle{ \lim_{n \rightarrow \infty} I[f_n] - I_S[f_n] = 0 }[/math] Since [math]\displaystyle{ I[f_n] }[/math] cannot be computed for an unknown probability distribution, the generalization error cannot be computed either. Instead, the aim of many problems in statistical learning theory is to bound or characterize the generalization error in probability: : [math]\displaystyle{ P_G = P(I[f_n] - I_S[f_n] \leq \epsilon) \geq 1 - \delta_n }[/math] That is, the goal is to characterize the probability [math]\displaystyle{ 1 - \delta_n }[/math] that the generalization error is less than some error bound [math]\displaystyle{ \epsilon }[/math] (known as the learning rate and generally dependent on [math]\displaystyle{ \delta }[/math] and [math]\displaystyle{ n }[/math] ).
- In a learning problem, the goal is to develop a function [math]\displaystyle{ f(x) }[/math] that predicts output values [math]\displaystyle{ y }[/math] based on some input data [math]\displaystyle{ x }[/math] . The expected error, [math]\displaystyle{ I[f_n] }[/math] of a particular function [math]\displaystyle{ f_n }[/math] over all possible values of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] is: : [math]\displaystyle{ I[f_n] = \int_{X \times Y} V(f_n(x),y) \rho(x,y) dx dy, }[/math] where [math]\displaystyle{ V }[/math] denotes a loss function and [math]\displaystyle{ \rho(x,y) }[/math] is the unknown joint probability distribution for [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] .
2009
- (Chen et al., 2009) ⇒ Bo Chen, Wai Lam, Ivor Tsang, and Tak-Lam Wong. (2009). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557045
- We theoretically analyze the expected error in the target domain showing that the error bound can be controlled by the expected loss in the source domain, and the embedded distribution gap, so as to prove that what we minimize in the objective function is very reasonable for domain adaptation.
1997
- (Stark,1997) ⇒ P.B. Stark (1997–2017). “Glossary of Statistical Terms". Published online at: http://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm
- The RMSE of an estimator is a measure of the expected error of the estimator. The units of RMSE are the same as the units of the estimator.