GloVe System

Context:
- Source code available at https://github.com/stanfordnlp/GloVe
- It was first introduced by Pennington et al. (2014).
- It can implement a GloVe Algorithm to solve text classification and NLP tasks.
Example(s):
- Glove,
- …
Counter-Example(s):
See: Distributional Co-Occurrence Word Vector, OOV Embedding System, Sentiment Analysis, Natural Language Processing, Language Model, Sequence Tagging, Vector (Mathematics), Real Numbers, Embedding, Vector Space, Neural Net Language Model, Dimensionality Reduction, co-Occurrence Matrix, Syntactic Parsing.

References

(NLP Stanford, 2021) ⇒ https://nlp.stanford.edu/projects/glove/ Retrieved:2021-05-08.
- QUOTE: GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
  (...)
  GloVe is essentially a log-bilinear model with a weighted least-squares objective. The main intuition underlying the model is the simple observation that ratios of word-word co-occurrence probabilities have the potential for encoding some form of meaning.

(Pennington et al., 2014) ⇒ Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (2014). “GloVe: Global Vectors for Word Representation.” In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014).
- QUOTE: The statistics of word occurrences in a corpus is the primary source of information available to all unsupervised methods for learning word representations, and although many such methods now exist, the question still remains as to how meaning is generated from these statistics, and how the resulting word vectors might represent that meaning. In this section, we shed some light on this question. We use our insights to construct a new model for word representation which we call GloVe, for Global Vectors, because the global corpus statistics are captured directly by the model.
  First we establish some notation. Let the matrix of word-word co-occurrence counts be denoted by $X$, whose entries $X_{ij}$ tabulate the number of times word $j$ occurs in the context of word $i$. Let $X_i = \sum_k X_{ik}$ be the number of times any word appears in the context of word $i$. Finally, let $P_{ij} = P\left(j|i\right) = X_{ij}/X_i$ be the probability that word $j$ appear in the context of word $i$.