Pointer-Generator Sequence-to-Sequence Neural Network

From GM-RKB

(Redirected from Pointer-Generator Seq2Seq Neural Network)

Jump to navigation Jump to search

A Pointer-Generator Sequence-to-Sequence Neural Network is a Sequence-to-Sequence Neural Network With Attention that is based on a Pointer Network Model and a Word Generation Probability Function.

AKA: Pointer-Generator Network.
Context:
- It was initially developed by See et al., (2017) for copying words from the source text via Vinyals Pointer Network.
- It can also be categorized as a Modular Neural Network.
- It can be trained using a See-Liu-Manning Text Summarization System.
Example(s):
- a Pointer-Generator Seq2Seq Neural Network with Coverage (See et al., 2017).
- …
Counter-Example(s):
See: Sequence-to-Sequence Model, Neural Machine Translation, Encoder-Decoder Neural Network, Artificial Neural Network, Natural Language Processing Task, Language Model, Summarization NLP Task.

References

2017

(See et al., 2017) ⇒ Abigail See, Peter J. Liu, and Christopher D. Manning. (2017). “Get To The Point: Summarization with Pointer-Generator Networks.” In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). DOI:10.18653/v1/P17-1099.
- QUOTE: Our hybrid pointer-generator network facilitates copying words from the source text via pointing (Vinyals et al., 2015), which improves accuracy and handling of OOV words, while retaining the ability to generate new words. The network, which can be viewed as a balance between extractive and abstractive approaches, is similar to Gu et al. ’s (2016) CopyNet and Miao and Blunsom’s (2016) Forced-Attention Sentence Compression, that were applied to short-text summarization.
  
  (...)

**Figure 3:** Pointer-generator model. For each decoder timestep a generation probability $p_{gen} \in [0,1]$ is calculated, which weights the probability of generating words from the vocabulary, versus copying words from the source text. The vocabulary distribution and the attention distribution are weighted and summed to obtain the final distribution, from which we make our prediction. Note that out-of-vocabulary article words such as *2-0* are included in the final distribution. Best viewed in color.

2015

(Vinyals et al., 2015) ⇒ Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. (2015). "Pointer Networks". In: Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems (NIPS 2015).
- QUOTE: Nonetheless, these methods still require the size of the output dictionary to be fixed a priori. Because of this constraint we cannot directly apply this framework to combinatorial problems where the size of the output dictionary depends on the length of the input sequence. In this paper, we address this limitation by repurposing the attention mechanism of Bahdanau et al. (2014) to create pointers to input elements. We show that the resulting architecture, which we name Pointer Networks (Ptr-Nets), can be trained to output satisfactory solutions to three combinatorial optimization problems – computing planar convex hulls, Delaunay triangulations and the symmetric planar Travelling Salesman Problem (TSP). The resulting models produce approximate solutions to these problems in a purely data driven fashion (i.e., when we only have examples of inputs and desired outputs). The proposed approach is depicted in Figure 1.

**Figure 1:**(a) Sequence-to-Sequence - An RNN (blue) processes the input sequence to create a code vector that is used to generate the output sequence (purple) using the probability chain rule and another RNN. The output dimensionality is fixed by the dimensionality of the problem and it is the same during training and inference in Sutskever et al.(2014). (b) Ptr-Net - An encoding RNN converts the input sequence to a code (blue) that is fed to the generating network (purple). At each step, the generating network produces a vector that modulates a content-based attention mechanism over inputs (Bahdanau et al., 2015, Graves et al., 2014). The output of the attention mechanism is a softmax distribution with dictionary size equal to the length of the input.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Pointer-Generator_Sequence-to-Sequence_Neural_Network&oldid=839475"