Pointer-Generator Seq2Seq Neural Network with Coverage
A Pointer-Generator Seq2Seq Neural Network with Coverage is a Pointer-Generator Sequence-to-Sequence Neural Network that includes a Coverage Mechanism.
- Example(s):
- Pointer-Generator Network with a coverage mechanism described in See et al., (2017).
- …
- Counter-Example(s):
- See: Sequence-to-Sequence Model, Neural Machine Translation, Recurrent Encoder-Decoder Neural Network, Seq2Seq with Attention Training Algorithm, Artificial Neural Network, Natural Language Processing Task, Language Model.
References
2017
- (See et al., 2017) ⇒ Abigail See, Peter J. Liu, and Christopher D. Manning. (2017). “Get To The Point: Summarization with Pointer-Generator Networks.” In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). DOI:10.18653/v1/P17-1099.
- QUOTE: Repetition is a common problem for sequenceto-sequence models (Tu et al., 2016; Mi et al., 2016; Sankaran et al., 2016; Suzuki and Nagata, 2016), and is especially pronounced when generating multi-sentence text (see Figure 1). We adapt the coverage model of Tu et al. (2016) to solve the problem. In our coverage model, we maintain a coverage vector $c^t$, which is the sum of attention distributions over all previous decoder timesteps: [math]\displaystyle{ c^t =\displaystyle \sum^{t−1}_{t'=0} a^{t'}\quad\quad }[/math](10)
Intuitively, $c^t$ is a (unnormalized) distribution over the source document words that represents the degree of coverage that those words have received from the attention mechanism so far. Note that $c^0$ is a zero vector, because on the first timestep, none of the source document has been covered. The coverage vector is used as extra input to the attention mechanism, changing equation (1) to:
[math]\displaystyle{ e^t_i = \nu^T tanh(W_hh_i +W_sS_t +w_cc^t_i +b_{attn})\quad\quad }[/math](11)where $w_c$ is a learnable parameter vector of same length as $\nu$. This ensures that the attention mechanism’s current decision (choosing where to attend next) is informed by a reminder of its previous decisions (summarized in $c_t$). This should make it easier for the attention mechanism to avoid repeatedly attending to the same locations, and thus avoid generating repetitive text.
- QUOTE: Repetition is a common problem for sequenceto-sequence models (Tu et al., 2016; Mi et al., 2016; Sankaran et al., 2016; Suzuki and Nagata, 2016), and is especially pronounced when generating multi-sentence text (see Figure 1). We adapt the coverage model of Tu et al. (2016) to solve the problem. In our coverage model, we maintain a coverage vector $c^t$, which is the sum of attention distributions over all previous decoder timesteps: