Position Embedding
Jump to navigation
Jump to search
A Position Embedding is an embedding function that ...
- Example(s):
- Counter-Example(s):
- See: Position Embedding Matrix, Neural Transformer Model.
References
2019
- (Devlin et al., 2019) ⇒ Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Volume 1 (Long and Short Papers). DOI:10.18653/v1/N19-1423. arXiv:1810.04805
- QUOTE: Pre-trained text encoders (Peters et al., 2018b; Devlin et al., 2018; Radford et al., 2018, 2019; Yang et al., 2019) have drawn much attention in natural language processing (NLP), because state-of-the-art performance can be obtained for many NLP tasks using such encoders. In general, these encoders are implemented by training a deep neural model on large unlabeled corpora.
- QUOTE: ... For a given token, its input representation is constructed by summing the corresponding token, [[segment embedding|segment], and position embeddings.
2018
- (Chollampatt & Ng, 2018) ⇒ Shamil Chollampatt, and Hwee Tou Ng. (2018). “A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction.” In: Proceedings of the Thirty-Second Conference on Artificial Intelligence (AAAI-2018).
- QUOTE: Consider an input source sentence [math]\displaystyle{ S }[/math] given as a sequence of [math]\displaystyle{ m }[/math] source tokens [math]\displaystyle{ s_1;\cdots; s_m }[/math] and [math]\displaystyle{ s_i \in V_s }[/math], where [math]\displaystyle{ V_s }[/math] is the source vocabulary. The last source token, [math]\displaystyle{ s_m }[/math], is a special end-of-sentence marker token. The source tokens are embedded in continuous space as [math]\displaystyle{ s_1;\cdots; s_m }[/math]. The embedding [math]\displaystyle{ s_i \in R^d }[/math] is given by [math]\displaystyle{ s_i = w (s_i) + p (i) }[/math], where [math]\displaystyle{ w (s_i) }[/math] is the word embedding and [math]\displaystyle{ p (i) }[/math] is the position embedding corresponding to the position [math]\displaystyle{ i }[/math] of token [math]\displaystyle{ s_i }[/math] in the source sentence. Both embeddings are obtained from embedding matrices that are trained along with other parameters of the network.
2017
- (Gehring et al., 2017) ⇒ Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. (2017). “Convolutional Sequence to Sequence Learning.” In: International Conference on Machine Learning, pp. 1243-1252 . PMLR,
- QUOTE: ... Position embeddings are useful in our architecture since they give our model a sense of which portion of the sequence in the input or output it is currently dealing with (§5.4). ...