PMI Matrix

AKA: Pointwise Mutual Information Matrix.
Context:
- It can be a Weighted PMI Matrix.
- It can be a Shifted PMI Matrix.
- It can be a Non-Negative PMI Matrix.
- It can be created by PMI Matrix Creation System.
Example(s):
- a Word-Word PMI Matrix, such as: [math]\displaystyle{ \begin{array}{c|ccccc} & aardvark & ... & midterm & ... & zoo \\ \hline aardvark & 4.16... & ... & -3.18... & ... & 0.57... \\ ... & ... & ... & ... & ... & ... \\ midterm & -3.18... & ... & 4.16... & ... & -1.57... \\ ... & ... & ... & ... & ... & ... \\ zoo & 0.57... & ... & -1.57... & ... & 4.16... \end{array} }[/math].
- …
Counter-Example(s):
- a tf-idf Matrix.
- an IDF Vector.
See: Mutual Information.

References

(Niwa & Nitta, 1994) ⇒ Yoshiki Niwa, and Yoshihiko Nitta. (1994). “Co-occurrence Vectors from Corpora Vs. Distance Vectors from Dictionaries.” In: Proceedings of the 15th conference on Computational linguistics - Volume 1. doi:10.3115/991886.991938
- QUOTE: We use ordinary co-occurrence statistics and measure the co-occurrence likelihood between two words, X and Y, by the mutual information estimate (Church and Hanks, 1989): :[math]\displaystyle{ I(\mathbf{X},\mathbf{Y}) = \log^+ \frac{P(\mathbf{X} \mid \mathbf{Y})}{P(\mathbf{X}}) }[/math], where P(X) is the occurrence density of word X in a whole corpus, and the conditional probability [math]\displaystyle{ P(\mathbf{X} \mid \mathbf{Y}) }[/math] is the density of word X in a neighborhood of word Y. Here the neighborhood is defined as 50 words before or after any appearance of word Y. (There is a variety of neighborhood definitions such as "100 surrounding words" (Yarowsky 1992) and "within a distance of no more thall 3 words ignoring function words" (Dagan et al, 1993).)