Symmetrization Lemma
A Symmetrization Lemma is Lemma that applies to the symmetrization of nondecreasing, convex measurable functions.
- AKA: Basic Lemma, Vapnik-Chervonenkis Symmetrization Lemma.
- Example(s):
- For every nondecreasing, convex [math]\displaystyle{ \Phi: \R \to \R }[/math] and class of measurable functions [math]\displaystyle{ \mathcal{F} }[/math],
[math]\displaystyle{ \mathbb{E} \Phi (\|\mathbb{P}_n - P\|_{\mathcal{F}}) \leq \mathbb{E} \Phi \left(2 \left \|\mathbb{P}^0_n\right \|_{\mathcal{F}} \right) }[/math]
- …
- For every nondecreasing, convex [math]\displaystyle{ \Phi: \R \to \R }[/math] and class of measurable functions [math]\displaystyle{ \mathcal{F} }[/math],
- Counter-Example(s):
- See: Vapnik-Chervonenkis Theory, Class Complexity Bound, Growth Function, Generalization Bound, VC Connection.
References
2020
- (Wikipedia) ⇒ https://en.wikipedia.org/wiki/Vapnik–Chervonenkis_theory#Symmetrization Retrieved: 2020-03-12.
- The majority of the arguments of how to bound the empirical process, rely on symmetrization, maximal and concentration inequalities and chaining. Symmetrization is usually the first step of the proofs, and since it is used in many machine learning proofs on bounding empirical loss functions (including the proof of the VC inequality which is discussed in the next section) it is presented here.
Consider the empirical process:
[math]\displaystyle{ f \mapsto (\mathbb{P}_n - P)f = \dfrac{1}{n} \sum_{i = 1}^n (f(X_i) - Pf) }[/math]
Turns out that there is a connection between the empirical and the following symmetrized process:
[math]\displaystyle{ f \mapsto \mathbb{P}^0_n f = \dfrac{1}{n} \sum_{i = 1}^n \varepsilon_i f(X_i) }[/math]
The symmetrized process is a Rademacher process, conditionally on the data [math]\displaystyle{ X_i }[/math]. Therefore, it is a sub-Gaussian process by Hoeffding's inequality.
Lemma (Symmetrization). For every nondecreasing, convex Φ: R → R and class of measurable functions [math]\displaystyle{ \mathcal{F} }[/math],
[math]\displaystyle{ \mathbb{E} \Phi (\|\mathbb{P}_n - P\|_{\mathcal{F}}) \leq \mathbb{E} \Phi \left(2 \left \|\mathbb{P}^0_n\right \|_{\mathcal{F}} \right) }[/math]
The proof of the Symmetrization lemma relies on introducing independent copies of the original variables [math]\displaystyle{ X_i }[/math] (sometimes referred to as a ghost sample) and replacing the inner expectation of the LHS by these copies. After an application of Jensen's inequality different signs could be introduced (hence the name symmetrization) without changing the expectation. The proof can be found below because of its instructive nature.
- The majority of the arguments of how to bound the empirical process, rely on symmetrization, maximal and concentration inequalities and chaining. Symmetrization is usually the first step of the proofs, and since it is used in many machine learning proofs on bounding empirical loss functions (including the proof of the VC inequality which is discussed in the next section) it is presented here.
2017
- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). "Symmetrization Lemma"]. In: (Sammut & Webb, 2017). DOI:10.1007/978-1-4899-7687-1_970
- QUOTE: Given a distribution $P$ over a sample space $\mathcal{Z}$, a finite sample $\mathbf{z}=(z_1, \cdots, z_n) $ drawn i.i.d. from $P$ and a function $f: \mathcal{Z} \to \R$ we define the shorthand $E_P f =E_P [f(z)]$ and $E_{\mathbf{z}}=\dfrac{1}{n}\displaystyle\sum_{i-1}^n f(z_i)$ to denote the true and empirical average of $f$ . The symmetrization lemma is an important result in the learning theory as it allows the true average $E_P f$ found in generalization bounds to be replaced by a second empirical average $E_{\mathbf{z'}} f$ taken over an independent ghost sample $\mathbf{z'}=(z'_1, \cdots, z'_n) $ drawn i.i.d. from $P$ . Specifically, the symmetrization lemma states that for any $\epsilon > 0$ whenever $n\epsilon^2\ge 2$ [math]\displaystyle{ P^n\left(\displaystyle\underset{f\in F}{sup} |E_P f - E_{\mathbf{z}} f|\gt \epsilon\right) \le 2 P^{2n} \left(\displaystyle\underset{f\in F}{sup} |E_{\mathbf{z'}} f - E_{\mathbf{z}} f|\gt \dfrac{\epsilon}{2}\right) }[/math] (1)This means the typically difficult to analyze behavior of $E_P f$ – which involves the entire sample space $\mathcal{Z}$ – can be replaced by the evaluation of functions from $F$ over the points in $\mathbf{z}$ and $\mathbf{z'}$.
- QUOTE: Given a distribution $P$ over a sample space $\mathcal{Z}$, a finite sample $\mathbf{z}=(z_1, \cdots, z_n) $ drawn i.i.d. from $P$ and a function $f: \mathcal{Z} \to \R$ we define the shorthand $E_P f =E_P [f(z)]$ and $E_{\mathbf{z}}=\dfrac{1}{n}\displaystyle\sum_{i-1}^n f(z_i)$ to denote the true and empirical average of $f$ . The symmetrization lemma is an important result in the learning theory as it allows the true average $E_P f$ found in generalization bounds to be replaced by a second empirical average $E_{\mathbf{z'}} f$ taken over an independent ghost sample $\mathbf{z'}=(z'_1, \cdots, z'_n) $ drawn i.i.d. from $P$ . Specifically, the symmetrization lemma states that for any $\epsilon > 0$ whenever $n\epsilon^2\ge 2$
2010
- (Banerjee, 2010) ⇒ Moulinath Banerjee (2010). "Empirical Processes: Symmetrization". Department of Statistics, University of Michigan.
- QUOTE: The main results are on inequalities connecting sums of independent processes to symmetrized versions using independent Rademacher variables. Symmetrized versions of empirical properties are sub-Gaussian (in an appropriate sense) whence bounds on the modulus of continuity can be obtained in terms of covering numbers/packing numbers. Sub-Gaussianity of symmetrized empirical processes falls out of Hoeffding’s inequality.