Decoder-only Transformer-based Model

Context:
- It can (typically) be employed in natural language generation tasks.
- It can generate text in an autoregressive manner, where each new output is conditioned on the previous outputs.
- It can (often) use techniques such as masked self-attention to focus on already generated elements while predicting the next element in a sequence.
- It can be less suited for tasks requiring an understanding of input context, as it lacks an encoder component.
- ...
Example(s):
- GPT (Generative Pre-trained Transformer) models, such as GPT-3 and GPT-4.
- Decoder-only Language Models used in creative writing applications.
- ...
Counter-Example(s):
- Encoder-only Neural Network Model.
- Encoder-Decoder Neural Network Model.
See: Autoregressive Model, Natural Language Generation, Masked Self-Attention, Transformer Architecture.

Navigation menu