OpenAI GPT-2 Large Language Model (LLM)

From GM-RKB
(Redirected from GPT-2)
Jump to navigation Jump to search

An OpenAI GPT-2 Large Language Model (LLM) is an OpenAI transformer-based language model.



References

2024

  • (Karpathy, 2024a) ⇒ Andrej Karpathy. (2024). “Let's Reproduce GPT-2 (124M).” YouTube.
    • NOTES:
      • It covers the entire process of reproducing the GPT-2 (124M) model from scratch, starting from understanding the model's architecture to setting up the training run and finally generating text samples. It emphasizes the importance of comprehending the underlying principles and techniques involved in replicating such a sophisticated model accurately.
      • It begins with the detailed implementation of the GPT-2 architecture in PyTorch, highlighting the differences from the original Transformer. It explains the modifications specific to GPT-2, such as the reordering of layer normalization and the addition of specific layers, ensuring a thorough understanding of the model's structure.
      • It includes loading the pre-trained GPT-2 model weights using the Hugging Face library, providing insights into the intricacies of handling token and positional embeddings. It ensures that viewers can correctly initialize and utilize the model weights to replicate the performance of the original GPT-2.
    • QUOTE: We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. ...

2023

  • chat
    • GPT-2, or Generative Pre-trained Transformer 2, is an autoregressive language model developed by OpenAI. It is based on the Transformer architecture introduced by Vaswani et al. in 2017. Like GPT-3, GPT-2 also employs a single stack of Transformer layers without separate encoder and decoder components. The architecture mainly consists of self-attention mechanisms and feed-forward layers.

      The full GPT-2 model has 1.5 billion parameters. However, OpenAI released several smaller versions of GPT-2 with fewer parameters, allowing users to choose a model that best fits their computational resources and performance requirements. Here's a list of the published GPT-2 model versions along with their number of parameters:

      • GPT-2 Small (also known as "117M" or "DistilGPT-2"): 117 million parameters, the smallest GPT-2 model, designed for lower-resource tasks and faster response times.
      • GPT-2 Medium (also known as "345M"): 345 million parameters, offering a balance between performance and computational requirements.
      • GPT-2 Large (also known as "774M"): 774 million parameters, a larger model with improved performance compared to the smaller variants.
      • GPT-2 Extra Large (also known as "1.5B"): 1.5 billion parameters, the largest and most powerful GPT-2 model, delivering the highest-quality results for various NLP tasks.

2019b

  • (OpenAI, 2019) ⇒ https://openai.com/blog/better-language-models/
    • QUOTE: Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.

       GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages (the dataset which emphasizes diversity of content, by scraping content from the Internet. In order to preserve document quality, we used only pages which have been curated filtered by humans — specifically, we used outbound links from Reddit which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting (whether educational or funny), leading to higher data quality than other similar datasets, such as CommonCrawl.). ... GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.

2019c

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/OpenAI#GPT2 Retrieved:2019-9-8.
    • GPT2 (2019) is an AI system that generates text matching its input in subject and tone. For example, when fed the first sentence of George Orwell's novel Nineteen Eighty-Four it produces plausible futuristic fiction set in China. Unlike previous OpenAI products, GPT2 has not been released to the public out of concerns of potential misuse, including applications for writing fake news. Much of the academic community is skeptical that GPT2 poses a significant threat. The Allen Institute for Artificial Intelligence followed up with a tool to detect "neural fake news". Other researchers, like Jeremy Howard, warn of "the technology to totally fill Twitter, email, and the web up with reasonable-sounding, context-appropriate prose, which would drown out all other speech and be impossible to filter".

2019d

2019a