MIMICK System
A MIMICK System is a OOV Embedding System that is based on recurrent neural network (RNN) trained on the character-level datasets to predicts vectors for OOV words.
- Context:
- It was introduced by Pinter et al. (2017).
- It implements a MIMICK Algorithm to solve Text Classification and Parts-Of-Speech Tagging Tasks.
- It is based on MIMICK-RNN Model.
- Example(s):
- Counter-Example(s):
- See: OOV Word, Subword Unit, Distributional Co-Occurrence Word Vector, Term Vector Space, Sentiment Analysis, Natural Language Processing, Language Model, Sequence Tagging, Vector (Mathematics), Real Numbers, Embedding, Vector Space, Neural Net Language Model, Dimensionality Reduction, co-Occurrence Matrix, Syntactic Parsing.
References
2017
- (Pinter et al., 2017) ⇒ Yuval Pinter, Robert Guthrie, and Jacob Eisenstein. (2017). “Mimicking Word Embeddings Using Subword RNNs.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).
- QUOTE:
We approach this challenge from a quasi-generative perspective. Knowing nothing of a word except for its embedding and its written form, we attempt to learn the former from the latter. We train a recurrent neural network (RNN) on the character level with the embedding as the target, and use it later to predict vectors for OOV words in any downstream task. We call this model the MIMICK-RNN, for its ability to read a word's spelling and mimick its distributional embedding.
(..)
We approach the problem of out-of-vocabulary (OOV) embeddings as a generation problem: regardless of how the original embeddings were created, we assume there is a generative wordform-based protocol for creating these embeddings. By training a model over the existing vocabulary, we can later use that model for predicting the embedding of an unseen word.
- QUOTE: