EMNLP 2017 BiLSTM-CNN-CRF Training System
(Redirected from EMNLP 2017 BiLSTM-CNN-CRF)
Jump to navigation
Jump to search
An EMNLP 2017 BiLSTM-CNN-CRF Training System is a Bidirectional LSTM-CNN-CRF Training System developed by Reimers & Gurevych (2017).
- Example(s):
- Counter-Example(s):
- See: Bidirectional Neural Network, Convolutional Neural Network, Conditional Random Field, Bidirectional Recurrent Neural Network, Dynamic Neural Network.
References
2018
- (Reimers & Gurevych, 2018) ⇒ EMNLP 2017 BiLSTM-CNN-CRF repository: https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf Retrieved: 2018-07-08.
- QUOTE: This code can be used to run the systems proposed in the following papers:
- Huang et al., Bidirectional LSTM-CRF Models for Sequence Tagging - You can choose between a softmax and a CRF classifier.
- Ma and Hovy, End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF - Character based word representations using CNNs is achieved by setting the parameter
charEmbeddings
to CNN. - Lample et al, Neural Architectures for Named Entity Recognition - Character based word representations using LSTMs is achieved by setting the parameter
charEmbeddings
to LSTM. - Sogard, Goldberg: Deep multi-task learning with low level tasks supervised at lower layers - Train multiple task and supervise them on different levels.
- QUOTE: This code can be used to run the systems proposed in the following papers:
2017
- (Reimers & Gurevych, 2017) ⇒ (2017). "Reporting score distributions makes a difference: Performance study of lstm-networks for sequence tagging". arXiv preprint arXiv:1707.09861.
- QUOTE: We use a BiLSTM-network for sequence tagging as described in (Huang et al., 2015; Ma and Hovy, 2016; Lample et al., 2016). To be able to evaluate a large number of different network configurations, we optimized our implementation for efficiency, reducing by a factor of 6 the time required per epoch compared to Ma and Hovy (2016).
2016a
- (Lample et al., 2016) ⇒ Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. (2016). “Neural Architectures for Named Entity Recognition.” In: Proceedings of NAACL-HLT.
- QUOTE:
Figure 1: Main architecture of the network. Word embeddings are given to a bidirectional LSTM. [math]\displaystyle{ l_i }[/math] represents the word i and its left context, [math]\displaystyle{ r_i }[/math] represents the word i and its right context. Concatenating these two vectors yields a representation of the word i in its context, [math]\displaystyle{ c_i }[/math].
- QUOTE:
2016b
- (Ma & Hovy, 2016) ⇒ Xuezhe Ma, and Eduard Hovy (2016). "End-to-end sequence labeling via bi-directional lstm-cnns-crf". arXiv preprint arXiv:1603.01354.
- QUOTE: Finally, we construct our neural network model by feeding the output vectors of BLSTM into a CRF layer. Figure 3 illustrates the architecture of our network in detail. For each word, the character-level representation is computed by the CNN in Figure 1 with character embeddings as inputs. Then the character-level representation vector is concatenated with the word embedding vector to feed into the BLSTM network. Finally, the output vectors of BLSTM are fed to the CRF layer to jointly decode the best label sequence. As shown in Figure 3, dropout layers are applied on both the input and output vectors of BLSTM.
- QUOTE: Finally, we construct our neural network model by feeding the output vectors of BLSTM into a CRF layer. Figure 3 illustrates the architecture of our network in detail. For each word, the character-level representation is computed by the CNN in Figure 1 with character embeddings as inputs. Then the character-level representation vector is concatenated with the word embedding vector to feed into the BLSTM network. Finally, the output vectors of BLSTM are fed to the CRF layer to jointly decode the best label sequence. As shown in Figure 3, dropout layers are applied on both the input and output vectors of BLSTM.
Figure 1: The convolution neural network for extracting character-level representations of words. Dashed arrows indicate a dropout layer applied before character embeddings are input to CNN. | Figure 3: The main architecture of our neural network. The character representation for each word is computed by the CNN in Figure 1. Then the character representation vector is concatenated with the word embedding before feeding into the BLSTM network. Dashed arrows indicate dropout layers applied on both the input and output vectors of BLSTM. |
2015
- (Huang, Xu & Yu, 2015) ⇒ Zhiheng Huang, Wei Xu, Kai Yu (2015). "Bidirectional LSTM-CRF models for sequence tagging (PDF)". arXiv preprint arXiv:1508.01991.
- QUOTE: In sequence tagging task, we have access to both past and future input features for a given time, we can thus utilize a bidirectional LSTM network (Figure 4) as proposed in (Graves et al., 2013). In doing so, we can efficiently make use of past features (via forward states) and future features (via backward states) for a specific time frame. We train bidirectional LSTM networks using backpropagation through time (BPTT)(Boden., 2002). The forward and backward passes over the unfolded network over time are carried out in a similar way to regular network forward and backward passes, except that we need to unfold the hidden states for all time steps. We also need a special treatment at the beginning and the end of the data points. In our implementation, we do forward and backward for whole sentences and we only need to reset the hidden states to 0 at the begging of each sentence. We have batch implementation which enables multiple sentences to be processed at the same time. .
Figure 4: A bidirectional LSTM network.
- QUOTE: In sequence tagging task, we have access to both past and future input features for a given time, we can thus utilize a bidirectional LSTM network (Figure 4) as proposed in (Graves et al., 2013). In doing so, we can efficiently make use of past features (via forward states) and future features (via backward states) for a specific time frame. We train bidirectional LSTM networks using backpropagation through time (BPTT)(Boden., 2002). The forward and backward passes over the unfolded network over time are carried out in a similar way to regular network forward and backward passes, except that we need to unfold the hidden states for all time steps. We also need a special treatment at the beginning and the end of the data points. In our implementation, we do forward and backward for whole sentences and we only need to reset the hidden states to 0 at the begging of each sentence. We have batch implementation which enables multiple sentences to be processed at the same time.