Out-Of-Vocabulary (OOV) Word Embedding System
An Out-Of-Vocabulary (OOV) Word Embedding System is a Word Embedding System that maps OOV Words into a vector representation.
- AKA: OOV Vectorized Representation System, Out-Of-Vocabulary (OOV) Token Embedding System.
- Context:
- It implements an Out-Of-Vocabulary (OOV) Embedding Algorithm to solve an Out-Of-Vocabulary (OOV) Embedding Task.
- It can range from being an Unknown Word Embedding System to being a Rare Word Embedding System.
- Example(s):
- Counter-Example(s):
- See: In-Vocabulary Word, Unseen Word, Rare Word, NLP System, Text Classification System, Subword Unit.
References
2017
- (Pinter et al., 2017) ⇒ Yuval Pinter, Robert Guthrie, and Jacob Eisenstein. (2017). “Mimicking Word Embeddings Using Subword RNNs.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).
- QUOTE: One of the key advantages of word embeddings for natural language processing is that they enable generalization to words that are unseen in labeled training data, by embedding lexical features from large unlabeled datasets into a relatively low-dimensional Euclidean space. These low-dimensional embeddings are typically trained to capture distributional similarity, so that information can be shared among words that tend to appear in similar contexts.
However, it is not possible to enumerate the entire vocabulary of any language, and even large unlabeled datasets will miss terms that appear in later applications. The issue of how to handle these out-of-vocabulary (OOV) words poses challenges for embedding-based methods. These challenges are particularly acute when working with low-resource languages, where even unlabeled data may be difficult to obtain at scale. A typical solution is to abandon hope, by assigning a single OOV embedding to all terms that do not appear in the unlabeled data.
We approach this challenge from a quasi-generative perspective. Knowing nothing of a word except for its embedding and its written form, we attempt to learn the former from the latter. We train a recurrent neural network (RNN) on the character level with the embedding as the target, and use it later to predict vectors for OOV words in any downstream task. We call this model the MIMICK-RNN, for its ability to read a word's spelling and mimick its distributional embedding.
- QUOTE: One of the key advantages of word embeddings for natural language processing is that they enable generalization to words that are unseen in labeled training data, by embedding lexical features from large unlabeled datasets into a relatively low-dimensional Euclidean space. These low-dimensional embeddings are typically trained to capture distributional similarity, so that information can be shared among words that tend to appear in similar contexts.