Content-Based Attention Network

Context:
- …
Example(s):
Counter-Example(s):
- Location-based Attention Network.
See: Attention Mechanism, Neural Machine Translation Algorithm, Statistical Machine Translation Task, Encoder-Decoder Neural Network, Attention-Encoder-Decoder Neural Network, Bahdanau-Cho-Bengio Neural Machine Translation Task.

References

[math]\displaystyle{ c_{i}=\sum_{j=1}^{T} \alpha_{i, j} h_{j} }[/math]

(6)

:: The weight $\alpha_{i,j}$ of each $h_j$ is computed by

[math]\displaystyle{ \alpha_{i, j}=\exp \left(e_{i, j}\right) / \sum_{j=1}^{T} \exp \left(e_{i, j}\right) }[/math]

(7)

:: where

[math]\displaystyle{ e_{i, j}=\operatorname{Score}\left(s_{i-1}, h_{j}\right) }[/math]

(8)

:: Here the Score is an MLP network which measures how well the inputs around position $j$ and the output at position $i$ match. It is based on the LSTM hidden state $s_{i−1}$ and $h_j$ of the input sentence. Specifically, it can be further described by

[math]\displaystyle{ e_{i, j}=\operatorname{Score}\left(s_{i-1}, h_{j}\right) }[/math]

(9)

:: where $w$ and $b$ are vectors, and $W$ and $V$ are matrices.

[math]\displaystyle{ e_{i, j}=\mathbf{w}^{\top} \tanh \left(\mathbf{W} \mathbf{s}_{i-1}+\mathbf{V h}_{j}+\mathbf{b}\right) }[/math]

(10)