2020 VectorSemanticsandEmbeddings
Jump to navigation
Jump to search
- (Jurafsky & Martin, 2020) ⇒ Daniel Jurafsky, and James H. Martin. (2020). “Vector Semantics and Embeddings.” In: Speech and Language Processing (3rd ed. draft).
Subject Headings: Vector Space Model, Word Embedding, Term-Document Matrix, TF-IDF Algorithm, PPMI Algorithm, Pointwise Mutual Information (PMI), Word2Vec, Skip-Gram Embedding, Static Embedding.
Notes
Cited By
Quotes
Abstract
No_abstract
6.13 Summary
- In vector semantics, a word is modeled as a vector — a point in high-dimensional space, also called an embedding. In this chapter we focus on static embeddings, in each word is mapped to a fixed embedding.
- Vector semantic models fall into two classes: sparse and dense. In sparse models each dimension corresponds to a word in the vocabulary $V$ and cells are functions of co-occurrence counts. The term-document matrix has a row for each word (term) in the vocabulary and a column for each document. The word-context or term-term matrix has a row for each (target) word in the vocabulary and a column for each context term in the vocabulary. Two sparse weightings are common: the tf-idf weighting which weights each cell by its term frequency and inverse document frequency, and PPMI (pointwise positive mutual information) most common for word-context matrices.
- Dense vector models have dimensionality 50–1000. Word2vec algorithms like skip-gram are a popular way to compute dense embeddings. Skip-gram trains a logistic regression classifier to compute the probability that two words are ‘likely to occur nearby in text’. This probability is computed from the dot product between the embeddings for the two words.
- Skip-gram uses stochastic gradient descent to train the classifier, by learning embeddings that have a high dot product with embeddings of words that occur nearby and a low dot product with noise words.
- Other important embedding algorithms include GloVe, a method based on ratios of word co-occurrence probabilities.
- Whether using sparse or dense vectors, word and document similarities are computed by some function of the dot product between vectors. The cosine of two vectors — a normalized dot product — is the most popular such metric.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2020 VectorSemanticsandEmbeddings | Daniel Jurafsky James H. Martin | Vector Semantics and Embeddings |