Neural Sequence-to-Sequence (seq2seq)-based Model Training Algorithm
A Neural Sequence-to-Sequence (seq2seq)-based Model Training Algorithm is a sequence-to-sequence model training algorithm that can train a seq2seq model (which maps a variable-length input sequences to a variable-length output sequence).
- Context:
- It can be implemented by a seq2seq System (that solves a seq2seq modeling task).
- It can range from being a One-Layer seq2seq Model Training Algorithm to being a Multi-Layer seq2seq Model Training Algorithm.
- It can range from being a RNN-based seq2seq Training Algorithm (such as LSTM-based seq2seq), to being a CNN-based seq2seq Training Algorithm, to being ... .
- It can range from being a Character-Level seq2seq Algorithm, to being a Subword-Level seq2seq Algorithm, to being a Word-Level seq2seq Algorithm.
- …
- Example(s):
- a Neural seq2seq with Attention Learning Algorithm (to train a neural sea2seq with attention model).
- a Neural seq2seq Translation Algorithm (implemented by a neural seq2seq translation system).
- a Neural seq2seq Text Error Correction Model Training Algorithm (implemented by a neural seq2seq TEC system).
- …
- Counter-Example(s):
- See: LSTM Algorithm.
References
2017a
- https://github.com/baidu-research/tensorflow-allreduce/blob/master/tensorflow/docs_src/tutorials/seq2seq.md
- QUOTE: ... A basic sequence-to-sequence model, as introduced in Cho et al., 2014 , consists of two recurrent neural networks (RNNs): an encoder that processes the input and a decoder that generates the output. This basic architecture is depicted below.
Each box in the picture above represents a cell of the RNN, most commonly a GRU cell or an LSTM cell (see the RNN Tutorial for an explanation of those). Encoder and decoder can share weights or, as is more common, use a different set of parameters. Multi-layer cells have been successfully used in sequence-to-sequence models too, e.g. for translation Sutskever et al., 2014.
In the basic model depicted above, every input has to be encoded into a fixed-size state vector, as that is the only thing passed to the decoder. To allow the decoder more direct access to the input, an attention mechanism was introduced in Bahdanau et al., 2014. We will not go into the details of the attention mechanism (see the paper); suffice it to say that it allows the decoder to peek into the input at every decoding step. A multi-layer sequence-to-sequence network with LSTM cells and attention mechanism in the decoder looks like this.
- QUOTE: ... A basic sequence-to-sequence model, as introduced in Cho et al., 2014 , consists of two recurrent neural networks (RNNs): an encoder that processes the input and a decoder that generates the output. This basic architecture is depicted below.
2017b
- (Gehring et al., 2017) ⇒ Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. (2017). “Convolutional Sequence to Sequence Learning.” In: International Conference on Machine Learning.
- QUOTE: ... The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. ...
2016
- (Luong et al., 2016) ⇒ Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. (2016). “Multi-task Sequence to Sequence Learning.” In: Proceedings of 4th International Conference on Learning Representations (ICLR-2016).
- QUOTE: … Recently, sequence to sequence (seq2seq) learning, proposed by Kalchbrenner & Blunsom (2013), Sutskever et al. (2014), and Cho et al. (2014), emerges as an effective paradigm for dealing with variable-length inputs and outputs. seq2seq learning, at its core, uses recurrent neural networks to map variable-length input sequences to variable-length output sequences. While relatively new, the seq2seq approach has achieved state-of-the-art results in not only its original application – machine translation – (Luong et al., 2015b; Jean et al., 2015a; Luong et al., 2015a; Jean et al., 2015b; Luong & Manning, 2015), but also image caption generation (Vinyals et al., 2015b), and constituency parsing (Vinyals et al., 2015a).
2014
- (Sutskever et al., 2014) ⇒ Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. (2014). “Sequence to Sequence Learning with Neural Networks.” In: Advances in Neural Information Processing Systems (NIPS-2014).
2013a
- (Kalchbrenner & Blunsom, 2013) ⇒ Nal Kalchbrenner, and Phil Blunsom. (2013). “Recurrent Continuous Translation Models.” In: Proceedings of EMNLP 2013 (EMNLP-2013).
2013b
- (Auli et al., 2013) ⇒ Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. (2013). “Joint Language and Translation Modeling with Recurrent Neural Networks.” In: Proceedings of EMNLP 2013 (EMNLP-2013).