Self-Attention Building Block

From GM-RKB
Jump to navigation Jump to search

A Self-Attention Building Block is a Neural Network Component that enables a model to weigh the significance of different parts of an input sequence in relation to each other for generating a representation of the sequence.

  • Context:
    • It can (typically) calculate attention scores based on the input sequence itself, using Query (Q), Key (K), and Value (V) vectors derived from the input.
    • It can (often) be used to capture dependencies and relationships within the input data, regardless of their position in the sequence.
    • It can employ a mechanism that dynamically adjusts the focus on different parts of the input data as required by the task at hand.
    • It can be a fundamental part of more complex architectures like the Transformer model, where it is combined with other layers to process sequential data.
    • It can utilize softmax function to normalize attention scores, ensuring they sum up to one and can be interpreted as probabilities.
    • ...
  • Example(s):
    • In natural language processing, a self-attention building block within a Transformer model analyzing a sentence to determine the context around each word.
    • In a machine translation task, using self-attention to identify relevant parts of a sentence when translating it from one language to another.
    • ...
  • Counter-Example(s):
    • A Convolutional Neural Network (CNN) layer, which processes input data through filters focusing on local regions without dynamic weighting of input parts based on their global context.
    • A Recurrent Neural Network (RNN) unit, which processes sequence data one element at a time in a sequential manner rather than attending to all parts of the sequence simultaneously.
  • See: Transformer Building Block, Transformer Architecture, Attention Mechanism.


References

2020

2019

2019

2019

2018