Position Embedding

References

(Chollampatt & Ng, 2018) ⇒ Shamil Chollampatt, and Hwee Tou Ng. (2018). “A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction.” In: Proceedings of the Thirty-Second Conference on Artificial Intelligence (AAAI-2018).
- QUOTE: Consider an input source sentence [math]\displaystyle{ S }[/math] given as a sequence of [math]\displaystyle{ m }[/math] source tokens [math]\displaystyle{ s_1;\cdots; s_m }[/math] and [math]\displaystyle{ s_i \in V_s }[/math], where [math]\displaystyle{ V_s }[/math] is the source vocabulary. The last source token, [math]\displaystyle{ s_m }[/math], is a special end-of-sentence marker token. The source tokens are embedded in continuous space as [math]\displaystyle{ s_1;\cdots; s_m }[/math]. The embedding [math]\displaystyle{ s_i \in R^d }[/math] is given by [math]\displaystyle{ s_i = w (s_i) + p (i) }[/math], where [math]\displaystyle{ w (s_i) }[/math] is the word embedding and [math]\displaystyle{ p (i) }[/math] is the position embedding corresponding to the position [math]\displaystyle{ i }[/math] of token [math]\displaystyle{ s_i }[/math] in the source sentence. Both embeddings are obtained from embedding matrices that are trained along with other parameters of the network.

(Gehring et al., 2017) ⇒ Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. (2017). “Convolutional Sequence to Sequence Learning.” In: International Conference on Machine Learning, pp. 1243-1252 . PMLR,
- QUOTE: ... Position embeddings are useful in our architecture since they give our model a sense of which portion of the sequence in the input or output it is currently dealing with (§5.4). ...