PMI Matrix
(Redirected from PMI matrix)
Jump to navigation
Jump to search
A PMI Matrix is a co-occurrence matrix composed of PMI vectors (with PMI values).
- AKA: Pointwise Mutual Information Matrix.
- Context:
- It can be a Weighted PMI Matrix.
- It can be a Shifted PMI Matrix.
- It can be a Non-Negative PMI Matrix.
- It can be created by PMI Matrix Creation System.
- Example(s):
- a Word-Word PMI Matrix, such as: [math]\displaystyle{ \begin{array}{c|ccccc} & aardvark & ... & midterm & ... & zoo \\ \hline aardvark & 4.16... & ... & -3.18... & ... & 0.57... \\ ... & ... & ... & ... & ... & ... \\ midterm & -3.18... & ... & 4.16... & ... & -1.57... \\ ... & ... & ... & ... & ... & ... \\ zoo & 0.57... & ... & -1.57... & ... & 4.16... \end{array} }[/math].
- …
- Counter-Example(s):
- a tf-idf Matrix.
- an IDF Vector.
- See: Mutual Information.
References
2007
- (Budiu et al., 2007) ⇒ Raluca Budiu, Christiaan Royer, and Peter Pirolli. (2007). “Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora.” In: Large Scale Semantic Access to Content (Text, Image, Video, and Sound).
- QUOTE: Unlike LSA, the initial word-by-document co-occurrence matrix is replaced by a word-by-word PMI matrix, in which words are represented as vectors of PMI scores relative to other words in the vocabulary (Niwa & Nitta, 1994).
1994
- (Niwa & Nitta, 1994) ⇒ Yoshiki Niwa, and Yoshihiko Nitta. (1994). “Co-occurrence Vectors from Corpora Vs. Distance Vectors from Dictionaries.” In: Proceedings of the 15th conference on Computational linguistics - Volume 1. doi:10.3115/991886.991938
- QUOTE: We use ordinary co-occurrence statistics and measure the co-occurrence likelihood between two words, X and Y, by the mutual information estimate (Church and Hanks, 1989): :[math]\displaystyle{ I(\mathbf{X},\mathbf{Y}) = \log^+ \frac{P(\mathbf{X} \mid \mathbf{Y})}{P(\mathbf{X}}) }[/math], where P(X) is the occurrence density of word X in a whole corpus, and the conditional probability [math]\displaystyle{ P(\mathbf{X} \mid \mathbf{Y}) }[/math] is the density of word X in a neighborhood of word Y. Here the neighborhood is defined as 50 words before or after any appearance of word Y. (There is a variety of neighborhood definitions such as "100 surrounding words" (Yarowsky 1992) and "within a distance of no more thall 3 words ignoring function words" (Dagan et al, 1993).)