Soft-Attention Mechanism
Jump to navigation
Jump to search
A Soft-Attention Mechanism is an Attention Mechanism in which weights are placed “softly”.
- Context:
- It can mean that weights are placed “softly” over all patches in the source image (Luong, Pham et al., 2015).
- It can mean that weights are distributed “softly” over all path-contexts in a code snippet (Alon et al., 2019).
- Example(s):
- Counter-Example(s):
- See: Neural Network with Attention Mechanism, Self-Attention Mechanism, Deterministic Attention Mechanism, Attention-Encoder-Decoder Neural Network, Hierarchical Attention Network.
References
2019
- (Alon et al., 2019) ⇒ Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. (2019). “code2vec: Learning Distributed Representations of Code.” In: Proceedings of the ACM on Programming Languages (POPL), Volume 3.
- QUOTE: The terms “soft” and “hard” attention were proposed for the task of image caption generation by Xu et al.(2015). Applied in our setting, soft-attention means that weights are distributed “softly” over all path-contexts in a code snippet, while hard-attention refers to selection of a single path-context to focus on at a time. The use of soft-attention over syntactic paths is the main understanding that provides this work much better results than previous works. We compare our model with an equivalent model that uses hard-attention in Section 6.2, and show that soft-attention is more efficient for modeling code.
2015a
- (Luong, Pham et al., 2015) ⇒ Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. (2015). “Effective Approaches to Attention-based Neural Machine Translation". In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2015). DOI:10.18653/v1/D15-1166.
- QUOTE: This model takes inspiration from the tradeoff between the soft and hard attentional models proposed by Xu et al. (2015) to tackle the image caption generation task. In their work, soft attention refers to the global attention approach in which weights are placed “softly” over all patches in the source image. The hard attention, on the other hand, selects one patch of the image to attend to at a time. While less expensive at inference time, the hard attention model is non-differentiable and requires more complicated techniques such as variance reduction or reinforcement learning to train.
2015b
- (Xu et al., 2015) ⇒ Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. (2015). “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.” In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Volume 37.
- QUOTE: The contributions of this paper are the following:
- We introduce two attention-based image caption generators under a common framework (Sec. 3.1): 1) a “soft” deterministic attention mechanism trainable by standard back-propagation methods and 2) a “hard” stochastic attention mechanism trainable by maximizing an approximate variational lower bound or equivalently by REINFORCE (Williams, 1992).
- We show how we can gain insight and interpret the results of this framework by visualizing “where” and “what” the attention focused on (see Sec. 5.4.)
- Finally, we quantitatively validate the usefulness of attention in caption generation with state-of-the-art performance (Sec. 5.3) on three benchmark datasets: Flickr8k (Hodosh et al., 2013), Flickr30k (Young et al., 2014) and the MS COCO dataset (Lin et al., 2014).
- QUOTE: The contributions of this paper are the following: