Mutual Information Metric
(Redirected from mutual information metric)
Jump to navigation
Jump to search
A Mutual Information Metric is a relative metric that measures the mutual dependence of two random variables, [math]\displaystyle{ X, Y }[/math].
- AKA: I.
- Context:
- Metric Range:
[0,1]
. - It can be expressed as [math]\displaystyle{ I(X;Y) = H(X,Y) - H(X|Y) - H(Y|X) }[/math], where [math]\displaystyle{ H(Z) }[/math] is a marginal information entropy, [math]\displaystyle{ H(X|Y) }[/math] is a conditional entropy, [math]\displaystyle{ H(X, Y) }[/math] is a joint entropy, and [math]\displaystyle{ H(X) \ge H(X|Y) }[/math].
- It can range from being a Continuous-Variable Mutual Information Metric to being a Discrete-Variable Mutual Information Metric, calculated as [math]\displaystyle{ \sum_{y \in Y} \sum_{x \in X} p(x,y) \log{ \left(\frac{p(x,y)}{p(x)\,p(y)} \right) }, \,\! }[/math].
- It can be expressed with a Kullback–Leibler Divergence Measure.
- Metric Range:
- Example(s):
- Counter-Example(s):
- See: Information Gain Metric, Pointwise Mutual Information, Information Theory, Statistic Function.
References
2011
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Mutual_information
- In probability theory and information theory, the mutual information (sometimes known by the archaic term transinformation) of two random variables is a quantity that measures the mutual dependence of the two variables. The most common unit of measurement of mutual information is the bit, when logarithms to the base 2 are used.
- http://en.wikipedia.org/wiki/Mutual_information#Relation_to_other_quantities
- Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution: :[math]\displaystyle{ I(X;Y) = D_{\mathrm{KL}}(p(x,y)\|p(x)p(y)). }[/math]
Furthermore, let p(x|y) = p(x, y) / p(y). Then :[math]\displaystyle{ \begin{align} I(X;Y) & {} = \sum_y p(y) \sum_x p(x|y) \log_2 \frac{p(x|y)}{p(x)} \\ & {} = \sum_y p(y) \; D_{\mathrm{KL}}(p(x|y)\|p(x)) \\ & {} = \mathbb{E}_Y\{D_{\mathrm{KL}}(p(x|y)\|p(x))\}. \end{align} }[/math] Thus mutual information can also be understood as the expectation of the Kullback-Leibler divergence of the univariate distribution p(x) of X from the conditional distribution p(x|y) of X given Y: the more different the distributions p(x|y) and p(x), the greater the information gain.
- Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution: :[math]\displaystyle{ I(X;Y) = D_{\mathrm{KL}}(p(x,y)\|p(x)p(y)). }[/math]
2003
- (Torkkola, 2003) ⇒ Kari Torkkola. (2003). “Feature Extraction by Non Parametric Mutual Information Maximization.” In: The Journal of Machine Learning Research, 3.
- QUOTE: We present a method for learning discriminative feature transforms using as criterion the mutual information between class labels and transformed features. Instead of a commonly used mutual information measure based on Kullback-Leibler divergence, we use a quadratic divergence measure, which allows us to make an efficient non-parametric implementation and requires no prior assumptions about class densities.
2002
- (Strehl & Ghosh, 2002b) ⇒ Alexander Strehl, and Joydeep Ghosh. (2002). “Cluster Ensembles: A knowledge reuse framework for combining partitions.” In: Journal of Machine Learning Research, 3.
- QUOTE: Mutual information, which is a symmetric measure to quantify the statistical information shared between two distributions (Cover and Thomas, 1991), provides a sound indication of the shared information between a pair of clusterings. Let X and Y be the random variables described by the cluster labeling (a) and (b), with k(a) and k(b) groups respectively. Let I(X; Y ) denote the mutual information between X and Y, and H(X) denote the entropy of X. One can show that I(X; Y ) is a metric. There is no upper bound for I(X; Y ),
1991
- (Cover & Thomas, 1991) ⇒ Thomas M. Cover, and Joy A. Thomas. (1991). “Elements of Information Theory." Wiley-Interscience. ISBN:0471062596