Encoder-Decoder Sequence-to-Sequence Learning Task: Difference between revisions

Latest revision as of 07:29, 22 August 2024

An Encoder-Decoder Sequence-to-Sequence Learning Task is a Sequence-to-Sequence Learning Task that is based on a Encoder-Decoder Neural Network.

Context:
- Task Input Requirement : a Sequence Dataset.
- Task Output Requirement : a Sequence Dataset.
- Task Requirements: Encoder-Decoder Neural Network Model.
- …
Example(s):
- a Neural Machine Translation System that is based on a RNN Encoder-Decoder Network,
- a Sequence-to-Sequence Learning System that is based on a CNN Encoder-Decoder Network.
- …
Counter-Example(s):
See: Natural Language Processing Task, Sequence Learning Task, Word Sense Disambiguation, LSTM, Deep Neural Network, Memory Augmented Neural Network Training System, Deep Sequence Learning Task, Bidirectional LSTM.

References

2018

(Liao et al., 2018) ⇒ Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, and Fei Wu. (2018). “Deep Sequence Learning with Auxiliary Information for Traffic Prediction.” In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ISBN:978-1-4503-5552-0 doi:10.1145/3219819.3219895
- QUOTE: In this paper, we effectively utilise three kinds of auxiliary information in an encoder-decoder sequence to sequence (Seq2Seq) [7, 32] learning manner as follows: a wide linear model is used to encode the interactions among geographical and social attributes, a graph convolution neural network is used to learn the spatial correlation of road segments, and the query impact is quantified and encoded to learn the potential influence of online crowd queries(...)
  Figure 4 shows the architecture of the Seq2Seq model for traffic prediction. The encoder embeds the input traffic speed sequence [math]\displaystyle{ \{v_1,v_2, \cdots ,v_t \} }[/math] and the final hidden state of the encoder is fed into the decoder, which learns to predict the future traffic speed [math]\displaystyle{ \{\tilde{v}_{t+1},\tilde{v}_{t+2}, \cdots,\tilde{v}_{t+t'} \} }[/math]. Hybrid model that integrates the auxiliary information will be proposed based on the Seq2Seq model.
  
  Figure 4: Seq2Seq: The Sequence to Sequence model predicts future traffic speed [math]\displaystyle{ \{\tilde{v}_{t+1},\tilde{v}_{t+2}, \cdots ,\tilde{v}_{t+t'} \} }[/math], given the previous traffic speed [math]\displaystyle{ {v_1,v_2, ...v_t } }[/math].

@@ Line 1: / Line 1: @@
 An [[Encoder-Decoder Sequence-to-Sequence Learning Task]] is a [[Sequence-to-Sequence Learning Task]] that is based on a [[Encoder-Decoder Neural Network]].
 * <B>Context:</B>
-** '''[[Task Input Requirement]]''' : a [[Sequence Dataset]]
+** <B>[[Task Input Requirement]]</B> : a [[Sequence Dataset]].
-** '''[[Task Output Requirement]]''' : a [[Sequence Dataset]]
+** <B>[[Task Output Requirement]]</B> : a [[Sequence Dataset]].
-** '''[[Task Requirement]]s''': [[Encoder-Decoder Neural Network Model]].
+** <B>[[Task Requirement]]s</B>: [[Encoder-Decoder Neural Network Model]].
+** …
 * <B>Example(s):</B>
-* a [[Neural Machine Translation System]] that is based on a [[RNN Encoder-Decoder Network]],
+** a [[Neural Machine Translation System]] that is based on a [[RNN Encoder-Decoder Network]],
-* a [[Sequence-to-Sequence Learning System]] that is based on a [[CNN Encoder-Decoder Network]].
+** a [[Sequence-to-Sequence Learning System]] that is based on a [[CNN Encoder-Decoder Network]].
+** …
 * <B>Counter-Example(s):</B>
-** a [[Convolutional Sequence-to-Sequence Learning Task]],
+** a [[Convolutional Sequence-to-Sequence Training Task]],
-** a [[Connectionist Sequence Classification Task]],
+** a [[Connectionist Sequence Classification Task]];
-** a [[Multi-modal Sequence to Sequence Learning]],
+** a [[Multi-modal Sequence to Sequence Training]];
-** a [[Sequence-to-Sequence Learning with Variational Auto-Encoder]],
+** a [[Neural seq2seq Model Training]];
-** a [[Sequence-to-Sequence Learning via Shared Latent Representation]],
+** a [[Sequence-to-Sequence Learning with Variational Auto-Encoder]];
-** a [[Sequence-to-Sequence Translation with Attention Mechanism]].
+** a [[Sequence-to-Sequence Learning via Shared Latent Representation]];
-* <B>See:</B> [[Natural Language Processing Task]], [[Sequence Learning Task]], [[Word Sense Disambiguation]], [[LSTM]], [[Deep Neural Network]], [[Memory Augmented Neural Network Training System]],  [[Deep Sequence Learning Task]], [[Bidirectional LSTM]].
+** a [[Sequence-to-Sequence Translation with Attention Mechanism]].
+* <B>See:</B> [[Natural Language Processing Task]], [[Sequence Learning Task]], [[Word Sense Disambiguation]], [[LSTM]], [[Deep Neural Network]], [[Memory Augmented Neural Network Training System]], [[Deep Sequence Learning Task]], [[Bidirectional LSTM]].
 ----
 ----
@@ Line 22: / Line 26: @@
 === 2018 ===
 * ([[2018_DeepSequenceLearningwithAuxilia|Liao et al., 2018]]) ⇒ [[Binbing Liao]], [[Jingqing Zhang]], [[Chao Wu]], [[Douglas McIlwraith]], [[Tong Chen]], [[Shengwen Yang]], [[Yike Guo]], and [[Fei Wu]]. ([[2018]]). &ldquo;[https://arxiv.org/pdf/1806.07380.pdf Deep Sequence Learning with Auxiliary Information for Traffic Prediction].&rdquo; In: [[Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining]]. ISBN:978-1-4503-5552-0 [http://dx.doi.org/10.1145/3219819.3219895 doi:10.1145/3219819.3219895]
-** QUOTE: In this paper, we effectively utilise three kinds of [[auxiliary information]] in an [[encoder-decoder sequence to sequence (Seq2Seq)]] &#91;[[2014_LearningPhraseRepresentationsUs|7]], [[2014_SequencetoSequenceLearningwithN|32]]&#93; learning manner as follows: a [[wide linear model]] is used to [[encode]] the interactions among [[geographical and social attribute]]s, a [[graph convolution neural network]] is used to [[learn]] the [[spatial correlation]] of [[road segment]]s, and the [[query impact]] is quantified and encoded to learn the potential influence of [[online crowd queries]](...)<P>Figure 4 shows the [[architecture]] of the [[Seq2Seq model]] for [[traffic prediction]]. The [[encoder]] embeds the [[input]] [[traffic speed]] [[sequence]] <math>\{v_1,v_2, \cdots ,v_t \}</math> and the final [[hidden state]] of the [[encoder]] is fed into the [[decoder]], which [[learn]]s to [[predict]] the future traffic speed <math>\{\tilde{v}_{t+1},\tilde{v}_{t+2}, \cdots,\tilde{v}_{t+t'} \}</math>. [[Hybrid model]] that integrates the [[auxiliary information]] will be proposed based on the [[Seq2Seq model]].<P>[[File: 2018_DeepSequenceLearningwithAuxilia_Fig4.png|500px|nothumb|center|]]<P> <B>Figure 4:</B> [[Seq2Seq]]: The [[Sequence to Sequence model]] predicts future [[traffic speed]] <math>\{\tilde{v}_{t+1},\tilde{v}_{t+2}, \cdots ,\tilde{v}_{t+t'} \}</math>, given the previous [[traffic speed]] <math>{v_1,v_2, ...v_t }</math>.
+** QUOTE: In this paper, we effectively utilise three kinds of [[auxiliary information]] in an [[encoder-decoder sequence to sequence (Seq2Seq)]] &#91;[[2014_LearningPhraseRepresentationsUs|7]], [[2014_SequencetoSequenceLearningwithN|32]]&#93; learning manner as follows: a [[wide linear model]] is used to [[encode]] the interactions among [[geographical and social attribute]]s, a [[graph convolution neural network]] is used to [[learn]] the [[spatial correlation]] of [[road segment]]s, and the [[query impact]] is quantified and encoded to learn the potential influence of [[online crowd queries]](...)<P>          Figure 4 shows the [[architecture]] of the [[Seq2Seq model]] for [[traffic prediction]]. The [[encoder]] embeds the [[input]] [[traffic speed]] [[sequence]] <math>\{v_1,v_2, \cdots ,v_t \}</math> and the final [[hidden state]] of the [[encoder]] is fed into the [[decoder]], which [[learn]]s to [[predict]] the future traffic speed <math>\{\tilde{v}_{t+1},\tilde{v}_{t+2}, \cdots,\tilde{v}_{t+t'} \}</math>. [[Hybrid model]] that integrates the [[auxiliary information]] will be proposed based on the [[Seq2Seq model]].         <P>          [[File: 2018_DeepSequenceLearningwithAuxilia_Fig4.png|500px|nothumb|center|]]<P>        <B>Figure 4:</B> [[Seq2Seq]]: The [[Sequence to Sequence model]] predicts future [[traffic speed]] <math>\{\tilde{v}_{t+1},\tilde{v}_{t+2}, \cdots ,\tilde{v}_{t+t'} \}</math>, given the previous [[traffic speed]] <math>{v_1,v_2, ...v_t }</math>.
 === 2017 ===
-* ([[2017_UnsupervisedPretrainingforSeque|Ramachandran et al., 2017]]) ⇒ [[author::Prajit Ramachandran]], [[author::Peter J. Liu]], and [[author::Quoc V. Le]]. ([[year::2017]]). &ldquo;[http://www.aclweb.org/anthology/D17-1039 Unsupervised Pretraining for Sequence to Sequence Learning].&rdquo; In: [[Proceedings of  the 2017 Conference on Empirical Methods in Natural Language Processing]] ([[EMNLP 2017]]). [https://arxiv.org/abs/1611.02683 arViv:1611.02683]
+* ([[2017_UnsupervisedPretrainingforSeque|Ramachandran et al., 2017]]) ⇒ [[Prajit Ramachandran]], [[Peter J. Liu]], and [[Quoc V. Le]]. ([[2017]]). &ldquo;[http://www.aclweb.org/anthology/D17-1039 Unsupervised Pretraining for Sequence to Sequence Learning].&rdquo; In: [[Proceedings of  the 2017 Conference on Empirical Methods in Natural Language Processing]] ([[EMNLP 2017]]). [https://arxiv.org/abs/1611.02683 arViv:1611.02683]
-** QUOTE: Therefore, the basic [[procedure]] of our approach is to [[pretrain]] both the [[seq2seq]] [[Encoder-Decoder Neural Network|encoder and decoder network]]s with [[Natural Language Processing Model|language model]]s, which can be [[Neural Network Training Task|trained]] on large amounts of [[unlabeled text data]]. This can be seen in Figure 1, where the [[parameter]]s in the shaded boxes are [[pretrained]]. In the following we will describe the method in detail using [[machine translation]] as an example application.<P>[[File:2017_UnsupervisedPretrainingforSeque_Fig1.png|650px|nothumb|center|]]<P>Figure 1: [[Pretrained]] [[sequence to sequence model]]. The red [[parameter]]s are the [[encoder]] and the blue [[parameter]]s are the [[decoder]]. All parameters in a shaded box are [[pretrained]], either from the [[Input Dataset|source]] side (light red) or [[Output Dataset|target]] side (light blue) [[Natural Language Processing Model|language model]]. Otherwise, they are [[randomly initialized]].
+** QUOTE: Therefore, the basic [[procedure]] of our approach is to [[pretrain]] both the [[seq2seq]] [[Encoder-Decoder Neural Network|encoder and decoder network]]s with [[Natural Language Processing Model|language model]]s, which can be [[Neural Network Training Task|trained]] on large amounts of [[unlabeled text data]]. This can be seen in Figure 1, where the [[parameter]]s in the shaded boxes are [[pretrained]]. In the following we will describe the method in detail using [[machine translation]] as an example application.         <P>          [[File:2017_UnsupervisedPretrainingforSeque_Fig1.png|650px|nothumb|center|]]<P>          Figure 1: [[Pretrained]] [[sequence to sequence model]]. The red [[parameter]]s are the [[encoder]] and the blue [[parameter]]s are the [[decoder]]. All parameters in a shaded box are [[pretrained]], either from the [[Input Dataset|source]] side (light red) or [[Output Dataset|target]] side (light blue) [[Natural Language Processing Model|language model]]. Otherwise, they are [[randomly initialized]].
 === 2016 ===
-* ([[2016_MultiTaskSequencetoSequenceLear|Luong et al., 2016]]) ⇒ [[Minh-Thang Luong]], [[Quoc V. Le]], [[Ilya Sutskever]], [[Oriol Vinyals]], and [[Lukasz Kaiser]]. ([[2016]]). “[https://arxiv.org/pdf/1511.06114 Multi-task Sequence to Sequence Learning].&rdquo; In: Proceedings of 4th [[International Conference on Learning Representations]] ([[ICLR-2016]]).
+* ([[2016_MultiTaskSequencetoSequenceLear|Luong et al., 2016]]) ⇒ [[Minh-Thang Luong]], [[Quoc V. Le]], [[Ilya Sutskever]], [[Oriol Vinyals]], and [[Lukasz Kaiser]]. ([[2016]]). “[https://arxiv.org/pdf/1511.06114 Multi-task Sequence to Sequence Learning].&rdquo; In: Proceedings of 4th [[International Conference on Learning Representation]]s ([[ICLR-2016]]).
 === 2014a ===
 * ([[2014_SequencetoSequenceLearningwithN|Sutskever et al., 2014]]) ⇒ [[Ilya Sutskever]], [[Oriol Vinyals]], and [[Quoc V. Le]]. ([[2014]]). “[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Sequence to Sequence Learning with Neural Networks].” In: Advances in Neural Information Processing Systems. [https://arxiv.org/abs/1409.3215 arXiv:1409.321]
 === 2014b ===
-* ([[2014_LearningPhraseRepresentationsUs|Cho et al., 2014]]) ⇒ [[Kyunghyun Cho]], [[Bart van Merrienboer]], [[Caglar Gulcehre]], [[Dzmitry Bahdanau]], [[Fethi Bougares]], [[Holger Schwenk]], and [[Yoshua Bengio]]. ([[2014]]). [http://emnlp2014.org/papers/pdf/EMNLP2014179.pdf “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation”]. In: [[Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing]], ([[EMNLP-2014]]). [https://arxiv.org/abs/1406.1078 arXiv:1406.1078]
+* ([[2014_LearningPhraseRepresentationsUs|Cho et al., 2014a]]) ⇒ [[Kyunghyun Cho]], [[Bart van Merrienboer]], [[Caglar Gulcehre]], [[Dzmitry Bahdanau]], [[Fethi Bougares]], [[Holger Schwenk]], and [[Yoshua Bengio]]. ([[2014]]). [http://emnlp2014.org/papers/pdf/EMNLP2014179.pdf “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation”]. In: [[Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing]], ([[EMNLP-2014]]). [https://arxiv.org/abs/1406.1078 arXiv:1406.1078]
 ** QUOTE: In this paper, we propose a novel [[neural network architecture]] that learns to encode a [[variable-length]] [[sequence]] into a [[fixed-length]] [[vector representation]] and to decode a given [[fixed-length]] [[vector representation]] back into a [[variable-length]] [[sequence]].
 ----
 __NOTOC__
 [[Category:Concept]]
 [[Category:Machine Learning]]
-[[Category: Computational Linguistics]]
+[[Category:Computational Linguistics]]

Encoder-Decoder Sequence-to-Sequence Learning Task: Difference between revisions

Latest revision as of 07:29, 22 August 2024

References

2018

2017

2016

2014a

2014b

Navigation menu

Search