BiLSTM-CNN-CRF

Context:
- It can be trained by a Bidirectional LSTM-CNN-CRF Training System (that implements a Bidirectional LSTM-CNN-CRF Training Algorithm).
Example(s):
- A BiLSTM-CNN-CRF Network for Sequence Tagging using ELMo Word Representations [1].
- A BiLSTM-CNN-CRF Network for NLP Sequence Tagging Tasks using the EMNLP 2017 BiLSTM-CNN-CRF Training System [2]
Counter-Example(s):
- BiLSTM-CRF,
- BiLSTM-WSD,
- SBU-LSTM.
See: Bidirectional LSTM, CRF Training Task, Bidirectional RNN, Word Embedding.

References

(Peters et al., 2018) ⇒ Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer (2018). "Deep contextualized word representations". arXiv preprint arXiv:1802.05365.
- QUOTE: ELMo representations are deep, in the sense that they are a function of all of the internal layers of the biLM. More specifically, we learn a linear combination of the vectors stacked above each input word for each end task, which markedly improves performance over just using the top LSTM layer. Combining the internal states in this manner allows for very rich word representations. Using intrinsic evaluations, we show that the higher-level LSTM states capture context-dependent aspects of word meaning (e.g., they can be used without modification to perform well on supervised word sense disambiguation tasks) while lower level states model aspects of syntax (e.g., they can be used to do part-of-speech tagging).

(Github, 2018) ⇒ "BiLSTM-CNN-CRF with ELMo-Representations for Sequence Tagging" Retrieved: 2018-08-05
- QUOTE: For an IPython Notebook with a simple example how to use ELMo representations for sentence classification, see: Keras_ELMo_Tutorial.ipynb.
  This code is an extension of the emnlp2017-bilstm-cnn-crf implementation. Must examples can be used with only slight adaptation. Also please see that repository for an explanation about the definition of the datasets, the configuration of the hyperparameters, how to use it for multi-task learning, or how to create custom features. Most aspects from emnlp2017-bilstm-cnn-crf work the same in this implementation.

(Ma & Hovy, 2016) ⇒ Xuezhe Ma, and Eduard Hovy (2016). "End-to-end sequence labeling via bi-directional lstm-cnns-crf". arXiv preprint arXiv:1603.01354.
- QUOTE: Finally, we construct our neural network model by feeding the output vectors of BLSTM into a CRF layer. Figure 3 illustrates the architecture of our network in detail. For each word, the character-level representation is computed by the CNN in Figure 1 with character embeddings as inputs. Then the character-level representation vector is concatenated with the word embedding vector to feed into the BLSTM network. Finally, the output vectors of BLSTM are fed to the CRF layer to jointly decode the best label sequence. As shown in Figure 3, dropout layers are applied on both the input and output vectors of BLSTM.


Figure 1: The convolution neural network for extracting character-level representations of words. Dashed arrows indicate a dropout layer applied before character embeddings are input to CNN.	Figure 3: The main architecture of our neural network. The character representation for each word is computed by the CNN in Figure 1. Then the character representation vector is concatenated with the word embedding before feeding into the BLSTM network. Dashed arrows indicate dropout layers applied on both the input and output vectors of BLSTM.