Embeddings from Language Models (ELMo) Representation System
(Redirected from ELMo Word Representation)
Jump to navigation
Jump to search
An Embeddings from Language Models (ELMo) Representation System is a Deep Contextualized Word Representation System that models syntax, semantics and polysemy in text items.
- Context:
- It can be solved by using a ELMo-BiLSTM-CNN-CRF Training System.
- It produces word vectors which are leaned functions of hidden layers of a Deep Bidirectional Language Model pre-trained on a large text corpus.
- …
- Example(s):
- Counter-Example(s):
- Universal Language Model Fine-tuning for Text Classification (ULMFiT).
- BiLSTM-CRF.
- BiLSTM-WSD,
- SBU-LSTM,
- BERT System (Peters et al., 2018),
- fastText System (Bojanowski et al., 2017),
- GloVe System (Pennington et al., 2014),
- MIMICK System (Pinter et al., 2017),
- Polyglot System (Al-Rfou et al., 2013),
- SENNA Embedding System (Collobert & Weston, 2008),
- Word2Vec System (Mikolov et al., 2014).
- See: Bidirectional LSTM, CRF Training Task, Bidirectional RNN, Word Embedding, 1 Billion Word Benchmark.
References
2018a
- (Peters et al., 2018) ⇒ Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. (2018). "ELMo - Deep contextualized word representations"
- QUOTE: ELMo is a deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). These word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. They can be easily added to existing models and significantly improve the state of the art across a broad range of challenging NLP problems, including question answering, textual entailment and sentiment analysis.
(...)
ELMo representations are:
- Contextual: The representation for each word depends on the entire context in which it is used.
- Deep: The word representations combine all layers of a deep pre-trained neural network.
- Character based: ELMo representations are purely character based, allowing the network to use morphological clues to form robust representations for out-of-vocabulary tokens unseen in training.
- QUOTE: ELMo is a deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). These word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. They can be easily added to existing models and significantly improve the state of the art across a broad range of challenging NLP problems, including question answering, textual entailment and sentiment analysis.
2018b
- (Peters et al., 2018) ⇒ Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. (2018). “Deep Contextualized Word Representations.” In: Proceedings of NAACL-HLT 2018.
- QUOTE: We use vectors derived from a bidirectional LSTM that is trained with a coupled language model (LM) objective on a large text corpus. For this reason, we call them ELMo (Embeddings from Language Models) representations. Unlike previous approaches for learning contextualized word vectors (Peters et al., 2017; McCann et al., 2017), ELMo representations are deep, in the sense that they are a function of all of the internal layers of the biLM. More specifically, we learn a linear combination of the vectors stacked above each input word for each end task, which markedly improves performance over just using the top LSTM layer. Combining the internal states in this manner allows for very rich word representations. Using intrinsic evaluations, we show that the higher-level LSTM states capture context-dependent aspects of word meaning (e.g., they can be used without modification to perform well on supervised word sense disambiguation tasks) while lower level states model aspects of syntax (e.g., they can be used to do part-of-speech tagging).
2018c
- (Github, 2018) ⇒ "BiLSTM-CNN-CRF with ELMo-Representations for Sequence Tagging" Retrieved: 2018-08-05
- QUOTE: For an IPython Notebook with a simple example how to use ELMo representations for sentence classification, see: Keras_ELMo_Tutorial.ipynb.
This code is an extension of the emnlp2017-bilstm-cnn-crf implementation. Must examples can be used with only slight adaptation. Also please see that repository for an explanation about the definition of the datasets, the configuration of the hyperparameters, how to use it for multi-task learning, or how to create custom features. Most aspects from emnlp2017-bilstm-cnn-crf work the same in this implementation.
- QUOTE: For an IPython Notebook with a simple example how to use ELMo representations for sentence classification, see: Keras_ELMo_Tutorial.ipynb.