Shifted PPMI Matrix

A Shifted PPMI Matrix is a word-context PMI matrix where each matrix cell is computed as [math]\displaystyle{ \operatorname{max}(0.0, PMI(w, c) - \log(k)) }[/math], where [math]\displaystyle{ k }[/math] is the number of negative samples.

- …
Counter-Example(s):
- PMI Matrix.
- GloVe Matrix.
See: Weighted Matrix, SPPMI-SVD Algorithm.

References

2014

(Řehůřek, 2014a) ⇒ Radim Řehůřek. (2014-12-23). http://radimrehurek.com/2014/12/making-sense-of-word2vec/
- QUOTE: In a manner analogous to GloVe, Levy and Goldberg analyzed the objective function of word2vec with negative sampling. ... “SPPMI” name simply reflects that we’re subtracting [math]\displaystyle{ \log(k) }[/math] from PMI (“shifting”) ... the word x context objective “source” matrix is computed differently to GloVe. Each matrix cell, corresponding to word w and context word c, is computed as [math]\displaystyle{ \operatorname{max}(0.0, PMI(w, c) - \log(k)) }[/math], where k is the number of negative samples in word2vec (for example, [math]\displaystyle{ k=10 }[/math]). PMI is the standard pointwise mutual information — if we use the notation that word w and context c occurred together \#wc times in the training corpus, then [math]\displaystyle{ PMI(w, c) = \log \frac{\#wc * \sum{w,c}\#wc}{\sum_c\#wc * \sum_w\#wc} }[/math] (no smoothing).
  The ... “SPPMI” name simply reflects that we’re subtracting [math]\displaystyle{ \log(k) }[/math] from PMI (“shifting”) and that we’re taking the [math]\displaystyle{ \operatorname{max|(0.0, SPMI) }[/math] (“positive”; should be non-negative, really). So, Shifted Positive Pointwise Mutual Information.
  For example, for the same nine texts we used above and k=1, the SPPMI matrix looks like this:

 1 | [[0.    0.83  0.83  0.49  0.49  0.    0.49  0.13  0.    0.    0.    0. ]
 2 | [  0.83  0.    1.16  0.    0.    0.83  0.    0.    0.98  0.    0.    0. ]
 3 | [  0.83  1.16  0.    0.    0.    0.13  0.    0.47  0.98  0.    0.    0. ]
 4 | [  0.49  0.    0.    0.    0.49  0.    1.18  0.83  0.    0.    0.    0. ]
 5 | [  0.49  0.    0.    0.49  0.    0.    0.49  0.13  0.    0.    0.83  1.05]
 6 | [  0.    0.83  0.13  0.    0.    0.    0.    0.13  1.05  0.    0.    0. ]
 7 | [  0.49  0.    0.    1.18  0.49  0.    0.    0.83  0.    0.    0.    0. ]
 8 | [  0.13  0.    0.47  0.83  0.13  0.13  0.83  0.    0.29  0.    0.    0. ]
 9 | [  0.    0.98  0.98  0.    0.    1.05  0.    0.29  0.    0.    0.    0. ]
10 | [  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    2.37  1.9]
11 | [  0.    0.    0.    0.    0.83  0.    0.    0.    0.    2.37  0.    2.08]
12 | [  0.    0.    0.    0.    1.05  0.    0.    0.    0.    1.9   2.08  0.]]

No neural network training, no parameter tuning, we can directly take rows of this SPPMI matrix to be the word vectors.

... The SPPMI-SVD method simply factorizes the sparse SPPMI matrix using Singular Value Decomposition (SVD), rather than the gradient descent methods of word2vec/GloVe, and uses the (dense) left singular vectors as the final word embeddings.

(Levy & Goldberg, 2014) ⇒ Omer Levy, and Yoav Goldberg. (2014). “Neural Word Embedding As Implicit Matrix Factorization.” In: Advances in Neural Information Processing Systems (NIPS 2014).
- QUOTE: We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks.

Shifted PPMI Matrix

References

2014

Navigation menu

Search