Positive Pointwise Mutual Information (PPMI)
Jump to navigation
Jump to search
A Positive Pointwise Mutual Information (PPMI) is a PMI for positive values.
- See: Log Function, Shifted PPMI.
References
2015
- (Levy et al., 2015) ⇒ Omer Levy, Yoav Goldberg, and Ido Dagan. (2015). “Improving Distributional Similarity with Lessons Learned from Word Embeddings.” In: Transactions of the Association for Computational Linguistics, 3.
- QUOTE: The rows of [math]\displaystyle{ M^{PMI} }[/math] contain many entries of word-context pairs (w, c) that were never observed in the corpus, for which PMI(w, c) = log 0 = −1.
A common approach is thus to replace the [math]\displaystyle{ M^{PMI} }[/math] matrix with [math]\displaystyle{ M^{PMI}_0 }[/math], in which PMI(w, c) = 0 in cases where #(w, c) = 0. A more consistent approach is to use positive PMI (PPMI), in which all negative values are replaced by 0:
[math]\displaystyle{ \operatorname{PPMI}(w, c) = \operatorname{max}(\operatorname{PMI} (w, c), 0) }[/math]
Bullinaria and Levy (2007) showed that [math]\displaystyle{ M^{PPMI} }[/math] outperforms </math>M^{PMI}_0</math> on semantic similarity tasks.A well-known shortcoming of PMI, which persists in PPMI, is its bias towards infrequent events (Turney and Pantel, 2010). A rare context c that co-occurred with a target word w even once, will often yield relatively high PMI score because [math]\displaystyle{ \hat{P}(c) }[/math], which is in PMI’s denominator, is very small. ...
- QUOTE: The rows of [math]\displaystyle{ M^{PMI} }[/math] contain many entries of word-context pairs (w, c) that were never observed in the corpus, for which PMI(w, c) = log 0 = −1.