See's Word Generation Probability Function

[math]\displaystyle{ p_{gen} = \sigma(w^T_{h^∗} h^∗_t +w^T_s s_t +w^T_x x_t +b_{ptr}) }[/math]

where vectors $w_{h^∗}$ , $w_s$ , $w_x$ and scalar $b_{ptr}$ are learnable parameters and $\sigma$ is the sigmoid function.

Example(s):
- …
Counter-Example(s):
See: Pointer-Generator Network, Neural Machine Translation, Encoder-Decoder Neural Network, Artificial Neural Network, Natural Language Processing Task, Language Model, Summarization NLP Task.

References

(See et al., 2017) ⇒ Abigail See, Peter J. Liu, and Christopher D. Manning. (2017). “Get To The Point: Summarization with Pointer-Generator Networks.” In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). DOI:10.18653/v1/P17-1099.
- QUOTE: Our pointer-generator network is a hybrid between our baseline and a pointer network (Vinyals et al., 2015), as it allows both copying words via pointing, and generating words from a fixed vocabulary. In the pointer-generator model (depicted in Figure 3) the attention distribution $a^t$ and context vector $h^∗_t$ are calculated as in section 2.1. In addition, the generation probability $p_{gen} \in [0,1]$ for timestep $t$ is calculated from the context vector $h^∗_t$ , the decoder state $s_t$ and the decoder input $x_t$ :
  [math]\displaystyle{ p_{gen} = \sigma(w^T_{h^∗} h^∗_t +w^T_s s_t +w^T_x x_t +b_{ptr})\quad\quad }[/math] (8)
  where vectors $w_{h^∗}$ , $w_s$ , $w_x$ and scalar $b_{ptr}$ are learnable parameters and $\sigma$ is the sigmoid function (...)

**Figure 3:** Pointer-generator model. For each decoder timestep a generation probability $p_{gen} \in [0,1]$ is calculated, which weights the probability of generating words from the vocabulary, versus copying words from the source text. The vocabulary distribution and the attention distribution are weighted and summed to obtain the final distribution, from which we make our prediction. Note that out-of-vocabulary article words such as *2-0* are included in the final distribution. Best viewed in color.