Neural Sequence-to-Sequence (seq2seq)-based Model Training Algorithm

Context:
- It can be implemented by a seq2seq System (that solves a seq2seq modeling task).
- It can range from being a One-Layer seq2seq Model Training Algorithm to being a Multi-Layer seq2seq Model Training Algorithm.
- It can range from being a RNN-based seq2seq Training Algorithm (such as LSTM-based seq2seq), to being a CNN-based seq2seq Training Algorithm, to being ... .
- It can range from being a Character-Level seq2seq Algorithm, to being a Subword-Level seq2seq Algorithm, to being a Word-Level seq2seq Algorithm.
- …
Example(s):
- a Neural seq2seq with Attention Learning Algorithm (to train a neural sea2seq with attention model).
- a Neural seq2seq Translation Algorithm (implemented by a neural seq2seq translation system).
- a Neural seq2seq Text Error Correction Model Training Algorithm (implemented by a neural seq2seq TEC system).
- …
Counter-Example(s):
- a CRF Algorithm.
- an LSTM-based Language Modeling Algorithm.
See: LSTM Algorithm.

References

https://github.com/baidu-research/tensorflow-allreduce/blob/master/tensorflow/docs_src/tutorials/seq2seq.md
- QUOTE: ... A basic sequence-to-sequence model, as introduced in Cho et al., 2014 , consists of two recurrent neural networks (RNNs): an encoder that processes the input and a decoder that generates the output. This basic architecture is depicted below.
  
  Each box in the picture above represents a cell of the RNN, most commonly a GRU cell or an LSTM cell (see the RNN Tutorial for an explanation of those). Encoder and decoder can share weights or, as is more common, use a different set of parameters. Multi-layer cells have been successfully used in sequence-to-sequence models too, e.g. for translation Sutskever et al., 2014.
  In the basic model depicted above, every input has to be encoded into a fixed-size state vector, as that is the only thing passed to the decoder. To allow the decoder more direct access to the input, an attention mechanism was introduced in Bahdanau et al., 2014. We will not go into the details of the attention mechanism (see the paper); suffice it to say that it allows the decoder to peek into the input at every decoding step. A multi-layer sequence-to-sequence network with LSTM cells and attention mechanism in the decoder looks like this.

(Kalchbrenner & Blunsom, 2013) ⇒ Nal Kalchbrenner, and Phil Blunsom. (2013). “Recurrent Continuous Translation Models.” In: Proceedings of EMNLP 2013 (EMNLP-2013).

(Auli et al., 2013) ⇒ Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. (2013). “Joint Language and Translation Modeling with Recurrent Neural Networks.” In: Proceedings of EMNLP 2013 (EMNLP-2013).