Content-Based Attention Network
(Redirected from content-based attention)
Jump to navigation
Jump to search
A Content-Based Attention Network is a Artificial Neural Network that includes an attention mechanism and a context vector.
- Context:
- …
- Example(s):
- Counter-Example(s):
- See: Attention Mechanism, Neural Machine Translation Algorithm, Statistical Machine Translation Task, Encoder-Decoder Neural Network, Attention-Encoder-Decoder Neural Network, Bahdanau-Cho-Bengio Neural Machine Translation Task.
References
2020
- (Liu et al., 2020) ⇒ Xiongfei Liu, Bengao Li, Xin Chen, Haiyan Zhang and Shu Zhan (2020). "Content-Based Attention Network for Person Image Generation". In: Journal of Circuits, Systems and Computers.
2018
- (Shan et al., 2018) ⇒ Changhao Shan, Junbo Zhang, Yujun Wang and Lei Xie (2018, April). "Attention-Based End-to-End Speech Recognition on Voice Search". In: Proceedings 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018). DOI:10.1109/ICASSP.2018.8462492.
- QUOTE: Content-based attention: Borrowed from neural machine translation (Bahdanau et al., 2015), content-based attention can be directly used in speech recognition. Here, the context vector $c_i$ is computed as a weighted sum of $h_i$ :
- QUOTE: Content-based attention: Borrowed from neural machine translation (Bahdanau et al., 2015), content-based attention can be directly used in speech recognition. Here, the context vector $c_i$ is computed as a weighted sum of $h_i$ :
[math]\displaystyle{ c_{i}=\sum_{j=1}^{T} \alpha_{i, j} h_{j} }[/math] | (6) |
:: The weight $\alpha_{i,j}$ of each $h_j$ is computed by
[math]\displaystyle{ \alpha_{i, j}=\exp \left(e_{i, j}\right) / \sum_{j=1}^{T} \exp \left(e_{i, j}\right) }[/math] | (7) |
:: where
[math]\displaystyle{ e_{i, j}=\operatorname{Score}\left(s_{i-1}, h_{j}\right) }[/math] | (8) |
:: Here the Score is an MLP network which measures how well the inputs around position $j$ and the output at position $i$ match. It is based on the LSTM hidden state $s_{i−1}$ and $h_j$ of the input sentence. Specifically, it can be further described by
[math]\displaystyle{ e_{i, j}=\operatorname{Score}\left(s_{i-1}, h_{j}\right) }[/math] | (9) |
:: where $w$ and $b$ are vectors, and $W$ and $V$ are matrices.
[math]\displaystyle{ e_{i, j}=\mathbf{w}^{\top} \tanh \left(\mathbf{W} \mathbf{s}_{i-1}+\mathbf{V h}_{j}+\mathbf{b}\right) }[/math] | (10) |
2015
- (Bahdanau et al., 2015) ⇒ Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. (2015). “Neural Machine Translation by Jointly Learning to Align and Translate.” In: Proceedings of the Third International Conference on Learning Representations, (ICLR-2015).