Neural Sequence-to-Sequence (seq2seq)-based Model Training System
A Neural Sequence-to-Sequence (seq2seq)-based Model Training System is a encoder-decoder network training system that is a sequence-to-sequence model training system which implements a neural seq2seq training algorithm to solve a seq2seq training task (that produces a trained neural seq2seq-based model).
- Context:
- …
- Example(s):
- Counter-Example(s):
- See: LSTM System, Encoder-Decoder with Attention Model Training System.
References
2017
- (Keras Blog, 2017) ⇒ https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
- QUOTE: ... Let's illustrate these ideas with actual code.
or our example implementation, we will use a dataset of pairs of English sentences and their French translation, which you can download from manythings.org/anki. The file to download is called fra-eng.zip. We will implement a character-level sequence-to-sequence model, processing the input character-by-character and generating the output character-by-character. Another option would be a word-level model, which tends to be more common for machine translation. At the end of this post, you will find some notes about turning our model into a word-level model using Embedding layers.
The full script for our example can be found on GitHub.
Here's a summary of our process:
- 1) Turn the sentences into 3 Numpy arrays,
encoder_input_data
,decoder_input_data
,decoder_target_data
:encoder_input_data
is a 3D array of shape (num_pairs
,max_english_sentence_length
,num_english_characters
) containing a one-hot vectorization of the English sentences.decoder_input_data
is a 3D array of shape (num_pairs
,max_french_sentence_length
,num_french_characters
) containg a one-hot vectorization of the French sentences.decoder_target_data
is the same asdecoder_input_data
but offset by one timestep.decoder_target_data[:, t, :]
will be the same asdecoder_input_data[:, t + 1, :]
.
- 2) Train a basic LSTM-based Seq2Seq model to predict
decoder_target_data
givenencoder_input_data
anddecoder_input_data
. Our model uses teacher forcing. - 3) Decode some sentences to check that the model is working (i.e. turn samples from
encoder_input_data
into corresponding samples fromdecoder_target_data
).
- 1) Turn the sentences into 3 Numpy arrays,
- QUOTE: ... Let's illustrate these ideas with actual code.
2017a
- (Github, 2017) ⇒ https://github.com/IBM/pytorch-seq2seq
- QUOTE: This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and extensible components for seq2seq models, training and inference, checkpoints, etc. This is an alpha release. We appreciate any kind of feedback or contribution.
Seq2seq is a fast evolving field with new techniques and architectures being published frequently. The goal of this library is facilitating the development of such techniques and applications. While constantly improving the quality of code and documentation, we will focus on the following items:
- Evaluation with benchmarks such as WMT machine translation, COCO image captioning, conversational models, etc;
- Provide more flexible model options, improving the usability of the library;
- Adding latest architectures such as the CNN based model proposed by Convolutional Sequence to Sequence Learning and the transformer model proposed by Attention Is All You Need;
- Support features in the new versions of PyTorch.
- QUOTE: This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and extensible components for seq2seq models, training and inference, checkpoints, etc. This is an alpha release. We appreciate any kind of feedback or contribution.
2017b
- (Jacobs, 2017) ⇒ Kevin Jacobs. (2017). “Create a Character-based Seq2Seq model using Python and Tensorflow." Blog post, 2017-12-14
- QUOTE: … I will share my findings on creating a character-based Sequence-to-Sequence model (Seq2Seq) and I will share some of the results I have found. …
… The Seq2Seq (sequence-to-sequence) model has the following architecture:
- QUOTE: … I will share my findings on creating a character-based Sequence-to-Sequence model (Seq2Seq) and I will share some of the results I have found. …
2017c
- (Github, 2017) ⇒ https://google.github.io/seq2seq/
- QUOTE: ... tf-seq2seq is a general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.