Jensen-Shannon Divergence (JSD) Metric

From GM-RKB
Jump to navigation Jump to search

A Jensen-Shannon Divergence (JSD) Metric is a symmetric probability distribution similarity measure.



References

2023

2011

  • http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
    • In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with the notable (and useful) difference that it is always a finite value. The square root of the Jensen–Shannon divergence is a metric.[3][4]

      Consider the set [math]\displaystyle{ M_+^1(A) }[/math] of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable. The Jensen–Shannon divergence (JSD) [math]\displaystyle{ M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty{}) }[/math] is a symmetrized and smoothed version of the Kullback–Leibler divergence [math]\displaystyle{ D(P \parallel Q) }[/math]. It is defined by :[math]\displaystyle{ JSD(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M) }[/math] where [math]\displaystyle{ M=\frac{1}{2}(P+Q) }[/math] If A is countable, a more general definition, allowing for the comparison of more than two distributions, is: :[math]\displaystyle{ JSD(P_1, P_2, \ldots, P_n) = H\left(\sum_{i=1}^n \pi_i P_i\right) - \sum_{i=1}^n \pi_i H(P_i) }[/math] where [math]\displaystyle{ \pi_1, \pi_2, \ldots, \pi_n }[/math] are the weights for the probability distributions [math]\displaystyle{ P_1, P_2, \ldots, P_n }[/math] and [math]\displaystyle{ H(P) }[/math] is the Shannon entropy for distribution [math]\displaystyle{ P }[/math]. For the two-distribution case described above, :[math]\displaystyle{ P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\ }[/math]

  1. Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304. ISBN 0-262-13360-1. http://nlp.stanford.edu/fsnlp/. 
  2. Dagan, Ido; Lillian Lee, Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: pp. 56–63. http://citeseer.ist.psu.edu/dagan97similaritybased.html. Retrieved 2008-03-09. 
  3. Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): pp. 1858–1860. doi:10.1109/TIT.2003.813506. 
  4. Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): pp. 639–653. doi:10.1007/BF02517812. 

1999