Neural Network with Self-Attention Mechanism

From GM-RKB

(Redirected from self-attention network)

Jump to navigation Jump to search

A Neural Network with Self-Attention Mechanism is a Neural Network with Attention Mechanism that includes a self-attention mechanism.

Example(s):
- Self-Attention Generative Adversarial Network,
- …
Counter-Example(s):
See: Attention Mechanism, Coverage Mechanism, Gating Mechanism, Neural Network with Attention Mechanism, Memory-Augmented Neural Network, Hierarchical Attention Network, Stack Memory.

References

2019

(Zhang, Goodfellow et al., 2019) ⇒ Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. (2019). “Self-attention Generative Adversarial Networks. In: International Conference on Machine Learning, pp. 7354-7363 . PMLR,

2017

(Lin et al., 2017) ⇒ Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. (2017). “A Structured Self-attentive Sentence Embedding.” In: Proceedings of the 5th International Conference on Learning Representations (ICRL-2017).
- QUOTE: Computing the linear combination requires the self-attention mechanism. The attention mechanism takes the whole LSTM hidden states $H$ as input, and outputs a vector of weights $a$:

$\mathbf{a} = softmax\left(\mathbf{w_{s2}}tanh\left(W_{s1}H^T\right)\right) $

(5)

Here $W_{s1}$ is a weight matrix with a shape of $d_a$-by-$2u$. and $\mathbf{w_{s2}}$ is a vector of parameters with size $d_a$, where $d_a$ is a hyperparameter we can set arbitrarily. Since $H$ is sized $n$-by-$2u$, the annotation vector a will have a size $n$. the $softmax(\cdot)$ ensures all the computed weights sum up to 1. Then we sum up the LSTM hidden states $H$ according to the weight provided by a to get a vector representation $m$ of the input sentence.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Neural_Network_with_Self-Attention_Mechanism&oldid=904088"