2020 VectorSemanticsandEmbeddings

(Jurafsky & Martin, 2020) ⇒ Daniel Jurafsky, and James H. Martin. (2020). “Vector Semantics and Embeddings.” In: Speech and Language Processing (3rd ed. draft).

Subject Headings: Vector Space Model, Word Embedding, Term-Document Matrix, TF-IDF Algorithm, PPMI Algorithm, Pointwise Mutual Information (PMI), Word2Vec, Skip-Gram Embedding, Static Embedding.

Notes

Cited By

http://scholar.google.com/scholar?q=%222020%22+Vector+Semantics+and+Embeddings

Quotes

Abstract

No_abstract

6.13 Summary

In vector semantics, a word is modeled as a vector — a point in high-dimensional space, also called an embedding. In this chapter we focus on static embeddings, in each word is mapped to a fixed embedding.
Vector semantic models fall into two classes: sparse and dense. In sparse models each dimension corresponds to a word in the vocabulary $V$ and cells are functions of co-occurrence counts. The term-document matrix has a row for each word (term) in the vocabulary and a column for each document. The word-context or term-term matrix has a row for each (target) word in the vocabulary and a column for each context term in the vocabulary. Two sparse weightings are common: the tf-idf weighting which weights each cell by its term frequency and inverse document frequency, and PPMI (pointwise positive mutual information) most common for word-context matrices.
Dense vector models have dimensionality 50–1000. Word2vec algorithms like skip-gram are a popular way to compute dense embeddings. Skip-gram trains a logistic regression classifier to compute the probability that two words are ‘likely to occur nearby in text’. This probability is computed from the dot product between the embeddings for the two words.
Skip-gram uses stochastic gradient descent to train the classifier, by learning embeddings that have a high dot product with embeddings of words that occur nearby and a low dot product with noise words.
Other important embedding algorithms include GloVe, a method based on ratios of word co-occurrence probabilities.
Whether using sparse or dense vectors, word and document similarities are computed by some function of the dot product between vectors. The cosine of two vectors — a normalized dot product — is the most popular such metric.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2020 VectorSemanticsandEmbeddings	Daniel Jurafsky James H. Martin			Vector Semantics and Embeddings

2020 VectorSemanticsandEmbeddings

Notes

Cited By

Quotes

Abstract

6.13 Summary

References

Navigation menu

Search