2014 LearningPhraseRepresentationsUs

(Cho et al., 2014a) ⇒ Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. (2014). “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation”. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP-2014). arXiv:1406.1078

Subject Headings: Gated Recurrent Unit (GRU), Encoder-Decoder GRU+Attention-based RNN, Sequence-to-Sequence Learning Task.

Notes

Cited By

2017

(Greff et al., 2017) ⇒ Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. (2017). “LSTM: A Search Space Odyssey.” In: IEEE Transactions on neural networks and learning systems 28, no. 10

2017b

https://github.com/baidu-research/tensorflow-allreduce/blob/master/tensorflow/docs_src/tutorials/seq2seq.md
- QUOTE: ... A basic sequence-to-sequence model, as introduced in Cho et al., 2014 , consists of two recurrent neural networks (RNNs): an encoder that processes the input and a decoder that generates the output. This basic architecture is depicted below.
  
  Each box in the picture above represents a cell of the RNN, most commonly a GRU cell or an LSTM cell (see the RNN Tutorial for an explanation of those). Encoder and decoder can share weights or, as is more common, use a different set of parameters. Multi-layer cells have been successfully used in sequence-to-sequence models too, e.g. for translation Sutskever et al., 2014.
  In the basic model depicted above, every input has to be encoded into a fixed-size state vector, as that is the only thing passed to the decoder. …

2015

(Xu et al., 2015) ⇒ Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. (2015). “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.” In: Proceedings of the International Conference on Machine Learning, (ICML-2015).
(Luong, Pham et al., 2015) ⇒ Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. (2015). “Effective Approaches to Attention-based Neural Machine Translation.” arXiv preprint arXiv:1508.04025.

Quotes

Abstract

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

1. Introduction

Deep neural networks have shown great success in various applications such as objection recognition (see, e.g., (Krizhevsky et al., 2012)) and speech recognition (see, e.g., (Dahl et al., 2012)). Furthermore, many recent works showed that neural networks can be successfully used in a number of tasks in natural language processing (NLP). These include, but are not limited to, language modeling (Bengio et al., 2003), paraphrase detection (Socher et al., 2011) and word embedding extraction (Mikolov et al., 2013). In the field of statistical machine translation (SMT), deep neural networks have begun to show promising results. (Schwenk, 2012) summarizes a successful usage of feedforward neural networks in the framework of phrase-based SMT system.

Along this line of research on using neural networks for SMT, this paper focuses on a novel neural network architecture that can be used as a part of the conventional phrase-based SMT system. The proposed neural network architecture, which we will refer to as an RNN Encoder–Decoder, consists of two recurrent neural networks (RNN) that act as an encoder and a decoder pair. The encoder maps a variable-length source sequence to a fixed-length vector, and the decoder maps the vector representation back to a variable-length target sequence. The two networks are trained jointly to maximize the conditional probability of the target sequence given a source sequence. Additionally, we propose to use a rather sophisticated hidden unit in order to improve both the memory capacity and the ease of training.

The proposed RNN Encoder–Decoder with a novel hidden unit is empirically evaluated on the task of translating from English to French. We train the model to learn the translation probability of an English phrase to a corresponding French phrase. The model is then used as a part of a standard phrase-based SMT system by scoring each phrase pair in the phrase table. The empirical evaluation reveals that this approach of scoring phrase pairs with an RNN Encoder–Decoder improves the translation performance.

We qualitatively analyze the trained RNN Encoder–Decoder by comparing its phrase scores with those given by the existing translation model. The qualitative analysis shows that the RNN Encoder–Decoder is better at capturing the linguistic regularities in the phrase table, indirectly explaining the quantitative improvements in the overall translation performance. The further analysis of the model reveals that the RNN Encoder–Decoder learns a continuous space representation of a phrase that preserves both the semantic and syntactic structure of the phrase.

…

References

BibTeX

@inproceedings{2014_LearningPhraseRepresentationsUs,
  author    = {Kyunghyun Cho and
               Bart van Merrienboer and
               {\c{C}}aglar G{\"{u}}l{\c{c}}ehre and
 [[Dzmitry Bahdanau]] and
               Fethi Bougares and
               Holger Schwenk and
 [[Yoshua Bengio]]},
  editor    = {Alessandro Moschitti and
               Bo Pang and
               Walter Daelemans},
  title     = {Learning Phrase Representations using {RNN} Encoder-Decoder for Statistical
               Machine Translation},
  booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural
               Language Processing (EMNLP 2014), October 25-29, 2014, Doha, Qatar,
               A meeting of SIGDAT, a Special Interest Group of the ACL},
  pages     = {1724--1734},
  publisher = {ACL},
  year      = {2014},
  url       = {https://www.aclweb.org/anthology/D14-1179.pdf},
  doi       = {10.3115/v1/d14-1179},
}

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2014 LearningPhraseRepresentationsUs	Yoshua Bengio Kyunghyun Cho Bart van Merrienboer Dzmitry Bahdanau Fethi Bougares Holger Schwenk Caglar Gulcehre			Learning Phrase Representations Using {RNN} Encoder-Decoder for Statistical Machine Translation						2014