Discrete-Variable Mutual Information Metric

Context:
- It can be calculated by:
  - [math]\displaystyle{ \sum_{y \in Y} \sum_{x \in X} p(x,y) \log{ \left(\frac{p(x,y)}{p(x)\,p(y)} \right) } }[/math].
  - [math]\displaystyle{ I_{max}(X,Y) = max_{c \in C}I(X,Y) = max_{c \in C} \log{ \left (\frac{p(x,y)}{p(x)p(y)} \right) } }[/math].
Example(s):
- …
Counter-Example(s):
See: Symmetric Function, Joint Distribution, Marginal Probability, Continuous Function, Double Integral, Bit, Information Entropy, if And Only if.

References

(Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/mutual_information#Definition_of_mutual_information Retrieved:2016-5-17.
- Formally, the mutual information of two discrete random variables X and Y can be defined as: : [math]\displaystyle{ I(X;Y) = \sum_{y \in Y} \sum_{x \in X} p(x,y) \log{ \left(\frac{p(x,y)}{p(x)\,p(y)} \right) }, \,\! }[/math] where p(x,y) is the joint probability distribution function of X and Y, and [math]\displaystyle{ p(x) }[/math] and [math]\displaystyle{ p(y) }[/math] are the marginal probability distribution functions of X and Y respectively.

(Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/Mutual_information#Mutual_information_for_discrete_data
- When X and Y are limited to be in a discrete number of states, observation data is summarized in a contingency table, with row variable X (or i) and column variable Y (or j). Mutual information is one of the measures of association or correlation between the row and column variables. Other measures of association include Pearson's chi-squared test statistics, G-test statistics, etc. In fact, mutual information is equal to G-test statistics divided by 2N where N is the sample size.
  In the special case where the number of states for both row and column variables is 2 (i,j=1,2), the degrees of freedom of the Pearson's chi-squared test is 1. Out of the four terms in the summation: :[math]\displaystyle{ \sum_{i,j } p_{ij} \log \frac{p_{ij}}{p_i p_j } }[/math] only one is independent. It is the reason that mutual information function has an exact relationship with the correlation function [math]\displaystyle{ p_{X=1, Y=1}-p_{X=1}p_{Y=1} }[/math] for binary sequences

↑ Wentian Li (1990). "Mutual information functions versus correlation functions". J. Stat. Phys. 60 (5-6): 823–837. doi:10.1007/BF01025996.