GloVe Algorithm

Context:
- It can be implemented by a GloVe-based System.
- It precomputes a Word-Word Co-Occurrence Matrix.
- It trains on Global Word-Word Co-occurrence Counts.
- It identifies Global Vectors.
- It uses Weighted Least Squares.
Example(s):
- glove-python - python implementation of GloVe.
- Glove v.2.0,
- GloVe v.1.2,
- GloVe v.1.0,
- …
Counter-Example(s):
- word2vec Algorithm with Skip-Gram with Negative Sampling.
- Continuous-BoW.
See: SGD Algorithm, Neural Sequence Learning Task, Natural Language Model, Sentiment Analysis, Word Sense Disambiguation, ULMFiT, Sequence-to-Sequence Learning.

References

(Rehurek, 2014) ⇒ Radim Rehurek. (2014) "Making sense of word2vec" Published Online: 2014-12-23
- Their method GloVe (Global Vectors) identified a matrix which, when factorized using the particular SGD algorithm of word2vec, yields out exactly these two matrices. So where word2vec was a bit hazy about what’s going on underneath, GloVe explicitly names the “objective” matrix, identifies the factorization, and provides some intuitive justification as to why this should give us working similarities. …
  … Basically, where GloVe precomputes the large word x word co-occurrence matrix in memory and then quickly factorizes it, word2vec sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory (GloVe) vs. taking longer to train (word2vec). Also, once computed, GloVe can re-use the co-occurrence matrix to quickly factorize with any dimensionality, whereas word2vec has to be trained from scratch after changing its embedding dimensionality.

(Pennington et al., 2014) ⇒ Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (2014). “GloVe: Global Vectors for Word Representation.” In: Proceedings of EMNLP 2014.
- QUOTE: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.