Neural Network with Self-Attention Mechanism
(Redirected from self-attention network)
Jump to navigation
Jump to search
A Neural Network with Self-Attention Mechanism is a Neural Network with Attention Mechanism that includes a self-attention mechanism.
- Example(s):
- Counter-Example(s):
- See: Attention Mechanism, Coverage Mechanism, Gating Mechanism, Neural Network with Attention Mechanism, Memory-Augmented Neural Network, Hierarchical Attention Network, Stack Memory.
References
2019
- (Zhang, Goodfellow et al., 2019) ⇒ Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. (2019). “Self-attention Generative Adversarial Networks. In: International Conference on Machine Learning, pp. 7354-7363 . PMLR,
2017
- (Lin et al., 2017) ⇒ Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. (2017). “A Structured Self-attentive Sentence Embedding.” In: Proceedings of the 5th International Conference on Learning Representations (ICRL-2017).
- QUOTE: Computing the linear combination requires the self-attention mechanism. The attention mechanism takes the whole LSTM hidden states $H$ as input, and outputs a vector of weights $a$:
$\mathbf{a} = softmax\left(\mathbf{w_{s2}}tanh\left(W_{s1}H^T\right)\right) $ |
(5) |
- Here $W_{s1}$ is a weight matrix with a shape of $d_a$-by-$2u$. and $\mathbf{w_{s2}}$ is a vector of parameters with size $d_a$, where $d_a$ is a hyperparameter we can set arbitrarily. Since $H$ is sized $n$-by-$2u$, the annotation vector a will have a size $n$. the $softmax(\cdot)$ ensures all the computed weights sum up to 1. Then we sum up the LSTM hidden states $H$ according to the weight provided by a to get a vector representation $m$ of the input sentence.