Generative Pre-trained Transformer (GPT) Language Model

From GM-RKB
Jump to navigation Jump to search

An Generative Pre-trained Transformer (GPT) Language Model is a transformer-based sequence-to-sequence model.



References

2023

  • (ChatGPT, 2023) ⇒ https://chat.openai.com
    • The general GPT (Generative Pre-trained Transformer) language model architecture is based on the Transformer architecture, which was introduced by Vaswani et al. in their 2017 paper "Attention is All You Need." ...
    • The main components of the GPT architecture are as follows:
      • Tokenization and positional encoding: Input text is first tokenized into subwords or tokens, which are then embedded into continuous vectors. Positional encodings are added to these token embeddings to retain information about the position of each token within the input sequence.
      • Stacked Transformer blocks: The GPT architecture consists of multiple Transformer blocks stacked on top of each other. Each block contains two primary components:
      • a. Multi-head self-attention mechanism: This mechanism allows the model to weigh and consider different parts of the input sequence when making predictions. It can capture various relationships and dependencies within the input data.
      • b. Position-wise feed-forward networks: These are fully connected feed-forward networks that process the output from the multi-head self-attention mechanism at each position in the input sequence.
      • Layer normalization and residual connections: Each Transformer block also includes layer normalization and residual connections, which help stabilize training and mitigate the vanishing gradient problem.
      • Output layer: The output from the final Transformer block is passed through a linear layer followed by a softmax activation function to generate probabilities for the next token in the sequence. The token with the highest probability is typically selected as the model's prediction.
    • The GPT models are pre-trained on large text corpora using unsupervised learning, which involves training the model to predict the next token in a sequence given its context. Once pre-trained, the model can be fine-tuned on specific tasks using supervised learning, where it is provided with labeled examples to learn from.

      The GPT architecture has evolved over multiple iterations, with newer models like GPT-3 having more parameters and stacked Transformer blocks, allowing for better performance on a wide range of natural language processing tasks.

2023

  • (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Generative_pre-trained_transformer Retrieved:2023-4-22.
    • Generative pre-trained transformers (GPT) are a family of large language models (LLMs)[1] [2] which was introduced in 2018 by the American artificial intelligence organization OpenAI.[3]GPT models are artificial neural networks that are based on the transformer architecture, pre-trained on large datasets of unlabelled text, and able to generate novel human-like text.[2] As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs. [4]

      Between 2018 and 2023, OpenAI released four major numbered GPT foundational models, with each being significantly more capable than the previous due to increased size (number of trainable parameters) and training. The GPT-3 model (2020) has 175 billion parameters and was trained on 400 billion tokens of text.[5] OpenAI declined to publish the size or training details of its GPT-4 model (2023), citing "the competitive landscape and the safety implications of large-scale models". These "GPT-n" models have been the basis for various other products and technologies, including models fine-tuned for instruction following which in turn power the ChatGPT chatbot service.

      The term "GPT" is also used in the names of some generative LLMs developed by others, such as a series of GPT-3 inspired models created by EleutherAI, and recently a series of seven models created by Cerebras. Major companies in other industries (e.g. sales, finance) also use the term "GPT" in the names of their services involving or utilizing a GPT technology, like "EinsteinGPT" and "BloombergGPT".