Recurrent (RNN/RNN)-based Encoder-Decoder Neural Network
An Recurrent (RNN/RNN)-based Encoder-Decoder Neural Network is an encoder-decoder neural network that consists of a encoder neural network and a decoder neural network in which both are recurrent neural networks.
- Context:
- It can be trained by a RNN-based Encoder-Decoder Model Training System (that implements an RNN-based encoder/decoder model training algorithm).
- It can be instantiated in a Trained RNN-based Encoder/Decoder Network.
- It can be used in Sequence-to-Sequence Learning Tasks, Neural Machine Translation Tasks and Neural Conversational Modelling Tasks.
- …
- Example(s):
- Counter-Example(s):
- See: Neural seq2seq, Neural Encoder/Decoder Model Training System, Mixed-Style Encoder Decoder Network, Sequence-to-Sequence Learning Task, Artificial Neural Network, Bidirectional Neural Network, Convolutional Neural Network, Neural Machine Translation Task, Deep Learning, Natural Language Processing.
References
2018a
- (Brownlee, 2018) ⇒ Jason Brownlee. (2018). “Encoder-Decoder Recurrent Neural Network Models for Neural Machine Translation."
- QUOTE: After reading this post, you will know:
- The encoder-decoder recurrent neural network architecture is the core technology inside Google’s translate service.
- The so-called “Sutskever model” for direct end-to-end machine translation.
- The so-called “Cho model” that extends the architecture with GRU units and an attention mechanism.
- QUOTE: After reading this post, you will know:
2018b
- (Saxena, 2018) ⇒ Rohan Saxena (April, 2018). "What is an Encoder/Decoder in Deep Learning?".
- QUOTE: In an RNN, an encoder-decoder network typically looks like this (an RNN encoder and an RNN decoder):
This is a network to predict responses for incoming emails. The left half of the network encodes the email into a feature vector, and the right half of the network decodes the feature vector to produce word predictions.
- QUOTE: In an RNN, an encoder-decoder network typically looks like this (an RNN encoder and an RNN decoder):
2017a
- (Robertson, 2017) ⇒ Sean Robertson. (2017). “Translation with a Sequence to Sequence Network and Attention.” In: TensorFlow Tutorials
- QUOTE: A basic sequence-to-sequence model, as introduced in Cho et al., 2014 , consists of two recurrent neural networks (RNNs): an encoder that processes the input and a decoder that generates the output. This basic architecture is depicted below.
Each box in the picture above represents a cell of the RNN, most commonly a GRU cell or an LSTM cell (see the RNN Tutorial for an explanation of those). Encoder and decoder can share weights or, as is more common, use a different set of parameters. Multi-layer cells have been successfully used in sequence-to-sequence models too, e.g. for translation Sutskever et al., 2014 .
- QUOTE: A basic sequence-to-sequence model, as introduced in Cho et al., 2014 , consists of two recurrent neural networks (RNNs): an encoder that processes the input and a decoder that generates the output. This basic architecture is depicted below.
2017b
- (Gupta et al., 2017) ⇒ Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. (2017). “DeepFix: Fixing Common C Language Errors by Deep Learning.” In: Proceeding of AAAI.
- QUOTE: Bahdanau, Cho, and Bengio (2014) introduced an attention mechanism on top of sequence-to-sequence model of (Sutskever, Vinyals, and Le 2014). Their network consists of an encoder RNN to process the input sequence and a decoder RNN with attention to generate the output sequence. Our network is based on the multi-layered variant in (Vinyals et al. 2015). We briefly describe it below.
Both the encoder and decoder RNNs consist of N stacked gated recurrent units (GRUs) (Cho et al. 2014). The encoder maps each token in the input sequence to a real vector called the annotation.
- QUOTE: Bahdanau, Cho, and Bengio (2014) introduced an attention mechanism on top of sequence-to-sequence model of (Sutskever, Vinyals, and Le 2014). Their network consists of an encoder RNN to process the input sequence and a decoder RNN with attention to generate the output sequence. Our network is based on the multi-layered variant in (Vinyals et al. 2015). We briefly describe it below.
2017c
- (Ramachandran et al., 2017) ⇒ Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. (2017). “Unsupervised Pretraining for Sequence to Sequence Learning.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). arViv:1611.02683
- QUOTE: Therefore, the basic procedure of our approach is to pretrain both the seq2seq encoder and decoder networks with language models, which can be trained on large amounts of unlabeled text data. This can be seen in Figure 1, where the parameters in the shaded boxes are pretrained. In the following we will describe the method in detail using machine translation as an example application.
2015
- (Bahdanau et al., 2015) ⇒ Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. (2015). “Neural Machine Translation by Jointly Learning to Align and Translate.” In: Proceedings of the Third International Conference on Learning Representations, (ICLR-2015).
2014a
- (Sutskever et al., 2014) ⇒ Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. (2014). “Sequence to Sequence Learning with Neural Networks.” In: Advances in Neural Information Processing Systems.
2014b
- (Cho et al., 2014) ⇒ Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. (2014). “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation”. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, (EMNLP-2014). arXiv:1406.1078