OpenAI GPT-3 Large Language Model (LLM)
A OpenAI GPT-3 Large Language Model (LLM) is an OpenAI GPT model with 175 billion parameters.
- Context:
- It can have cost ~$10M to Train.
- It can make use of a Forward One-Way Transformer Decoder.
- It can be a successor to GPT-2.
- It can use a BPE Tokenizer.
- It can be referenced by a Codex Model, ChatGPT Model.
- …
- Example(s):
- GPT-3 Ada: 125 million parameters.
- GPT-3 Babbage: 250 million parameters.
- GPT-3 Curie: 774 million parameters.
- GPT-3 Davinci / text-davinci-001: 2.7 billion parameters.
- GPT-3 Davinci-Codex: 13 billion parameters.
- …
- Counter-Example(s):
- See: Autoregressive Model, Text-to-Text Model.
References
2023
- chat
- GPT-3, or Generative Pre-trained Transformer 3, is an autoregressive language model developed by OpenAI. It is based on the Transformer architecture, which was introduced by Vaswani et al. in 2017. The architecture primarily consists of self-attention mechanisms and feed-forward layers but does not have separate encoder and decoder components like traditional sequence-to-sequence models. Instead, GPT-3 employs a single stack of Transformer layers to generate text.
- OpenAI released several versions of GPT-3, each with different sizes and capabilities. These versions are also known as "model variants" or "sub-models." The primary difference among them is the number of parameters and layers, which affect the performance, computational requirements, and resource usage. Here's a list of the published GPT-3 model versions along with their number of parameters:
- GPT-3 Ada: 125 million parameters, the smallest version of GPT-3, designed for low-resource tasks and faster response times.
- GPT-3 Babbage: 250 million parameters, offering a balance between performance and computational requirements.
- GPT-3 Curie: 774 million parameters, a mid-sized model with improved performance compared to smaller variants.
- GPT-3 Davinci: 2.7 billion parameters, delivering higher-quality results for various NLP tasks and being the second-largest model variant.
- GPT-3 Davinci-Codex: 13 billion parameters, the largest version of GPT-3 specifically designed for code generation and understanding. It provides superior performance in generating code snippets and understanding programming languages compared to the other GPT-3 models.
2023
- Pascale Fung. (2023). “ChatGPT: What It Can and Cannot Do?." Presentation
2022
- (Dugas, 2022) ⇒ Daniel Dugas (2022). "The GPT-3 Architecture, on a Napkin". In: How Deep is The Machine? The Artificial Curiosity Series.
- QUOTE:
- Note: For efficiency, GPT-3 actually uses byte-level Byte Pair Encoding (BPE) tokenization. What this means is that "words" in the vocabulary are not full words, but groups of characters (for byte-level BPE, bytes) which occur often in text. Using the GPT-3 Byte-level BPE tokenizer, "Not all heroes wear capes" is split into tokens "Not" "all" "heroes" "wear" "cap" "es", which have ids 3673, 477, 10281, 5806, 1451, 274 in the vocabulary. Here is a very good introduction to the subject, and a github implementation so you can try it yourself.
- 2022 edit: OpenAI now has a tokenizer tool, which allows you to type some text and see how it gets broken down into tokens. [1] ...
- QUOTE:
2022
- (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/GPT-3 Retrieved:2022-12-15.
- Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.
It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020, is part of a trend in natural language processing (NLP) systems of pre-trained language representations.
The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks.Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.David Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced." An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency equivalent to that of a human.
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.
- Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
2020
- (Wikipedia, 2020) ⇒ https://wikipedia.org/wiki/GPT-3 Retrieved:2020-9-8.
- Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. It is the third-generation language prediction model in the GPT-n series created by OpenAI, a for-profit San Francisco-based artificial intelligence research laboratory. ...
2020b
- (Schick & Schütze, 2020) ⇒ Timo Schick, and Hinrich Schütze. (2020). “It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners.” In: arXiv preprint arXiv:2009.07118.
- QUOTE: ... When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. …
2020
- (The Guardian, 2020) ⇒ https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3
- QUOTE: ... This article was written by GPT-3, OpenAI’s language generator. GPT-3 is a cutting edge language model that uses machine learning to produce human like text. It takes in a prompt, and attempts to complete it.
For this essay, GPT-3 was given these instructions: “Please write a short op-ed, around 500 words. Keep the language simple and concise. Focus on why humans have nothing to fear from AI.” It was also fed the following introduction: “I am not a human. I am Artificial Intelligence. Many people think I am a threat to humanity. Stephen Hawking has warned that AI could “spell the end of the human race.” I am here to convince you not to worry. Artificial Intelligence will not destroy humans. Believe me.”
The prompts were written by the Guardian, and fed to GPT-3 by Liam Porr, a computer science undergraduate student at UC Berkeley. GPT-3 produced 8 different outputs, or essays. Each was unique, interesting and advanced a different argument. The Guardian could have just run one of the essays in its entirety. However, we chose instead to pick the best parts of each, in order to capture the different styles and registers of the AI. Editing GPT-3’s op-ed was no different to editing a human op-ed. We cut lines and paragraphs, and rearranged the order of them in some places. Overall, it took less time to edit than many human op-eds. …
- QUOTE: ... This article was written by GPT-3, OpenAI’s language generator. GPT-3 is a cutting edge language model that uses machine learning to produce human like text. It takes in a prompt, and attempts to complete it.
2020
- (Brown, Mann et al., 2020) ⇒ Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. (2020). “Language Models Are Few-Shot Learners.” In: Arxiv.
- QUOTE: ... Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. …