Kullback-Leibler (KL) Divergence Measure

From GM-RKB
(Redirected from information gain)
Jump to navigation Jump to search

A Kullback-Leibler (KL) Divergence Measure is a non-symmetric probability distribution similarity measure that quantifies how one probability distribution diverges from a second, reference probability distribution.



References

2023

2011

  • http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
    • In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence, information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P. Although it is often intuited as a metric or distance, the KL divergence is not a true metric — for example, it is not symmetric: the KL from P to Q is generally not the same as the KL from Q to P. KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence in calculus. However, the KL divergence can be derived from the Bregman divergence.

      For probability distributions P and Q of a discrete random variable their K–L divergence is defined to be :[math]\displaystyle{ D_{\mathrm{KL}}(P\|Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}. \! }[/math] In words, it is the average of the logarithmic difference between the probabilities P and Q, where the average is taken using the probabilities P. The K-L divergence is only defined if P and Q both sum to 1 and if [math]\displaystyle{ Q(i)\gt 0 }[/math] for any i such that [math]\displaystyle{ P(i)\gt 0 }[/math]. If the quantity [math]\displaystyle{ 0 \log 0 }[/math] appears in the formula, it is interpreted as zero. For distributions P and Q of a continuous random variable, KL-divergence is defined to be the integral:[4] :[math]\displaystyle{ D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^\infty p(x) \log \frac{p(x)}{q(x)} \, {\rm d}x, \! }[/math] where p and q denote the densities of P and Q.

      More generally, if P and Q are probability measures over a set X, and Q is absolutely continuous with respect to P, then the Kullback–Leibler divergence from P to Q is defined as :[math]\displaystyle{ D_{\mathrm{KL}}(P\|Q) = -\int_X \log \frac{{\rm d}Q}{{\rm d}P} \,{\rm d}P, \! }[/math] where [math]\displaystyle{ \frac{{\rm d}Q}{{\rm d}P} }[/math] is the Radon–Nikodym derivative of Q with respect to P, and provided the expression on the right-hand side exists.

  1. Kullback, S.; Leibler, R.A. (1951). "On Information and Sufficiency". Annals of Mathematical Statistics 22 (1): 79–86. doi:10.1214/aoms/1177729694. MR39968. 
  2. S. Kullback (1959) Information theory and statistics (John Wiley and Sons, NY).
  3. Kullback, S.; Burnham, K. P.; Laubscher, N. F.; Dallal, G. E.; Wilkinson, L.; Morrison, D. F.; Loyer, M. W.; Eisenberg, B. et al. (1987). "Letter to the Editor: The Kullback–Leibler distance". The American Statistician 41 (4): 340–341. JSTOR 2684769. 
  4. C. Bishop (2006). Pattern Recognition and Machine Learning. p. 55.

2008

2007

2000

1989

1959

1951