McDiarmid's Inequality
A McDiarmid's Inequality is an Concentration Inequality that implies that the values of a bounded function of independent random variables is concentrated around its mean value.
- AKA: Bounded Differences Inequality.
- Context:
- It is a generalization of the Hoeffding's Inequality.
- Examples(s):
- for all [math]\displaystyle{ \epsilon\gt 0 }[/math]: [math]\displaystyle{ Pr[f-E[f]\geq \epsilon]\leq \exp \left(\dfrac{2\epsilon^2}{\sum_{i=1}^m c_i^2}\right) }[/math] (see e.g. Rastogi, 2007),
- …
- Counter-Example(s):
See: Doob Martingale, Independent Random Variable, Probability Theory, Law of Iterated Expectations.
References
2018a
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Doob_martingale#McDiarmid Retrieved:2018-12-8.
- One common way of bounding the differences and applying Azuma's inequality to a Doob martingale is called McDiarmid's inequality.[1]
Suppose [math]\displaystyle{ X_1, X_2, \dots, X_n }[/math] are independent and assume that [math]\displaystyle{ f }[/math] satisfies :
[math]\displaystyle{ \sup_{x_1,x_2,\dots,x_n, \hat x_i} |f(x_1,x_2,\dots,x_n) - f(x_1,x_2,\dots,x_{i-1},\hat x_i, x_{i+1}, \dots, x_n)| \le c_i \qquad \text{for} \quad 1 \le i \le n \; . }[/math] (In other words, replacing the [math]\displaystyle{ i }[/math] -th coordinate [math]\displaystyle{ x_i }[/math] by some other value changes the value of [math]\displaystyle{ f }[/math] by at most [math]\displaystyle{ c_i }[/math] .)
It follows that : [math]\displaystyle{ |B_{i+1}-B_i| \le c_i }[/math] and therefore Azuma's inequality yields the following McDiarmid inequalities for any [math]\displaystyle{ \varepsilon \gt 0 }[/math] : : [math]\displaystyle{ \Pr \left\{ f(X_1, X_2, \dots, X_n) - E[f(X_1, X_2, \dots, X_n)] \ge \varepsilon \right\} \le \exp \left( - \frac{2 \varepsilon^2}{\sum_{i=1}^n c_i^2} \right) }[/math] and : [math]\displaystyle{ \Pr \left\{ E[f(X_1, X_2, \dots, X_n)] - f(X_1, X_2, \dots, X_n) \ge \varepsilon \right\} \le \exp \left( - \frac{2 \varepsilon^2}{\sum_{i=1}^n c_i^2} \right) }[/math] and : [math]\displaystyle{ \Pr \left\{ |E[f(X_1, X_2, \dots, X_n)] - f(X_1, X_2, \dots, X_n)| \ge \varepsilon \right\} \le 2 \exp \left( - \frac{2 \varepsilon^2}{\sum_{i=1}^n c_i^2} \right). \; }[/math]
- One common way of bounding the differences and applying Azuma's inequality to a Doob martingale is called McDiarmid's inequality.[1]
2018b
- (Wikipedia, 2018b) ⇒ https://en.wikipedia.org/wiki/Concentration_inequality#Bounds_on_sums_of_independent_variables Retrieved:2018-12-9.
- Let [math]\displaystyle{ X_1, X_2,\dots,X_n }[/math] be independent random variables such that, for all i:
[math]\displaystyle{ a_i\leq X_i\leq b_i }[/math] almost surely.
[math]\displaystyle{ c_i := b_i-a_i }[/math]
[math]\displaystyle{ \forall i: c_i \leq C }[/math]
Let [math]\displaystyle{ S_n }[/math] be their sum, [math]\displaystyle{ E_n }[/math] its expected value and [math]\displaystyle{ V_n }[/math] its variance:
[math]\displaystyle{ S_n := \sum_{i=1}^n X_i }[/math]
[math]\displaystyle{ E_n := \operatorname{E}[S_n] = \sum_{i=1}^n \operatorname{E}[X_i] }[/math]
[math]\displaystyle{ V_n := \operatorname{Var}[S_n] = \sum_{i=1}^n \operatorname{Var}[X_i] }[/math]
It is often interesting to bound the difference between the sum and its expected value. Several inequalities can be used.
1. Hoeffding's inequality says that:
[math]\displaystyle{ \Pr\left[|S_n-E_n|\gt t\right] \lt 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2} \right)\lt 2 \exp \left(-\frac{2t^2}{n C^2} \right) }[/math]
2. The random variable [math]\displaystyle{ S_n-E_n }[/math] is a special case of a martingale, and [math]\displaystyle{ S_0-E_0=0 }[/math] . Hence, Azuma's inequality can also be used and it yields a similar bound:
[math]\displaystyle{ \Pr\left[|S_n-E_n|\gt t\right] \lt 2 \exp \left(-\frac{t^2}{2\sum_{i=1}^n c_i^2}\right)\lt 2 \exp \left(-\frac{t^2}{2 n C^2} \right) }[/math]
This is a generalization of Hoeffding's since it can handle other types of martingales, as well as supermartingales and submartingales.
3. The sum function, [math]\displaystyle{ S_n=f(X_1,\dots,X_n) }[/math] , is a special case of a function of n variables. This function changes in a bounded way: if variable i is changed, the value of f changes by at most [math]\displaystyle{ b_i-a_i\lt C }[/math]. Hence, McDiarmid's inequality can also be used and it yields a similar bound:
[math]\displaystyle{ \Pr\left[|S_n-E_n|\gt t\right] \lt 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2} \right)\lt 2 \exp \left(-\frac{2t^2}{n C^2} \right) }[/math]
This is a different generalization of Hoeffding's since it can handle other functions besides the sum function, as long as they change in a bounded way (...)
- Let [math]\displaystyle{ X_1, X_2,\dots,X_n }[/math] be independent random variables such that, for all i:
2017
- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). "McDiarmids' Inequality”. In: (Sammut & Webb, 2017)
- QUOTE: McDiarmid’s inequality shows how the values of a bounded function of independent random variables concentrate about its mean. Specifically, suppose [math]\displaystyle{ f: \mathcal{X}^n \to R }[/math] satisfies the bounded differences property. That is, for all [math]\displaystyle{ i=1,\cdots, n }[/math] there is a [math]\displaystyle{ c_i \geq 0 }[/math] such that for all [math]\displaystyle{ x_1 ,\cdots, x_n, X' \in \mathcal{X} }[/math]
[math]\displaystyle{ |f(x_1,\cdots, x_n) - f (x_1, \cdots, x_{i-1}, x', x_{i+1},\cdots, x_n)| \leq c_i }[/math]
If [math]\displaystyle{ \mathbf{X}=(X_1, \cdots, X_n)\in \mathcal{X}^n }[/math] is a random variable drawn according to [math]\displaystyle{ P^n }[/math] and [math]\displaystyle{ \mu=E_{P^n}[f(\mathbf{X})] }[/math]then for all [math]\displaystyle{ \epsilon\gt 0 }[/math]
[math]\displaystyle{ P^n(f(\mathbf{X})-\mu\geq \epsilon)\leq \exp \left(\dfrac{2\epsilon^2}{\sum_{i=1}^n c_i^2}\right) }[/math]
McDiarmid’s is a generalization of Hoeffding’s inequality, which can be obtained by assuming [math]\displaystyle{ \mathcal{X}=[a,b] }[/math] and choosing [math]\displaystyle{ f(\mathbf{X}) = \sum_{i=1}^n X_i }[/math] When applied to empirical risks this inequality forms the basis of many generalization bounds.
- QUOTE: McDiarmid’s inequality shows how the values of a bounded function of independent random variables concentrate about its mean. Specifically, suppose [math]\displaystyle{ f: \mathcal{X}^n \to R }[/math] satisfies the bounded differences property. That is, for all [math]\displaystyle{ i=1,\cdots, n }[/math] there is a [math]\displaystyle{ c_i \geq 0 }[/math] such that for all [math]\displaystyle{ x_1 ,\cdots, x_n, X' \in \mathcal{X} }[/math]
2016
- (Fan, 2016) ⇒ Xiequan Fan (2016). "On McDiarmid's inequality for Hamming distance". arXiv preprint arXiv:1611.03990.
- ABSTRACT: We improve the rate function of McDiarmid's inequality for Hamming distance. In particular, applying our result to the separately Lipschitz functions of independent random variables, we also refine the convergence rate function of McDiarmid's inequality around a median. Moreover, a non-uniform bound for the distance between the medians and the mean is also given. We also give some extensions of McDiarmid's inequalities to the case of nonnegative functionals of dependent random variables.
2007
- (Rastogi, 2007) ⇒ Ashish Rastogi (2007). "McDiarmid’s Inequality".
- QUOTE: Theorem: Let [math]\displaystyle{ X_1, \cdots, X_m }[/math] be independent random variables all taking values in the set [math]\displaystyle{ \mathcal{X} }[/math]. Further, [math]\displaystyle{ f:\mathcal{X}^m \to R }[/math]let be a function of [math]\displaystyle{ X_1,\cdots,X_m }[/math]that satisfies
[math]\displaystyle{ \forall i \forall x_1,\cdots,x_m,x'_i \in \mathcal{X} |f(x_1, \cdots, x_i, \cdots, x_m) - f(x_1,\cdots, x'_i, \cdots,x_m) |\leq c_i }[/math]
Then for all [math]\displaystyle{ \epsilon\gt 0 }[/math],
[math]\displaystyle{ Pr[f-E[f]\geq \epsilon]\leq \exp \left(\dfrac{2\epsilon^2}{\sum_{i=1}^m c_i^2}\right) }[/math]
- QUOTE: Theorem: Let [math]\displaystyle{ X_1, \cdots, X_m }[/math] be independent random variables all taking values in the set [math]\displaystyle{ \mathcal{X} }[/math]. Further, [math]\displaystyle{ f:\mathcal{X}^m \to R }[/math]let be a function of [math]\displaystyle{ X_1,\cdots,X_m }[/math]that satisfies
- ↑ McDiarmid, Colin (1989). "On the Method of Bounded Differences". Surveys in Combinatorics. 141: 148–188.