Pretrained Large Language Model (LLM)

A Pretrained Large Language Model (LLM) is a pretrained language model that is a large language model.

Context:
- It can be an input to a In-Context Learning System.
- It can be an input to a LLM Fine-Tuning System.
- ...
- It can range from being a Pure Pretrained LLM to being a Finetuned LLM (such as an instruction-tuned LLM).
- ...
Example(s):
- a General Purpose Pretrained LLM, such as GPT-4.
- a Domain-Specific Pretrained LLM, such as:
  - a Pretrained Biomedical LLM (e.g. BioGPT) or a Pretrained Protein LLM.
  - a Pretrained Software LLM, such as Codex LLM.
  - a Pretrained Finance LLM, such as Bloomberg LLM.
  - a Pretrained Legal LLM, such as [[]].
- a Proprietary Pretrained LLM, such as:
  - a Google Pretrained LLM, Azure Pretrained LLM, ...
- a Base LLM, such as: llama31-405b-base-bf-16.
- …
Counter-Example(s):
- a Pre-Trained Small Language Model.
- a Pre-Trained Image Generation Model.
See: Language Model Metamodel, LLM Architecture, ULMFiT.

References

2023

(Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Large_language_model#List_of_large_language_models Retrieved:2023-3-19.

List of large language models
Name	Release dateTemplate:Efn	Developer	Number of parametersTemplate:Efn	Corpus size	LicenseTemplate:Efn	Notes
BERT	2018	Google	340 million^[1]	3.3 billion words^[1]	Apache 2.0^[2]	early and influential language model^[3]
GPT-2	2019	OpenAI	1.5 billion^[4]	40GB^[5] (~10 billion tokens)^[6]	MIT^[7]	general-purpose model based on transformer architecture
GPT-3	2020	OpenAI	175 billion	499 billion tokens^[6]	Template:Public web API	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.^[8]
GPT-Neo	March 2021	EleutherAI	2.7 billion^[9]	825 GiB^[10]	MIT^[11]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[11]
GPT-J	June 2021	EleutherAI	6 billion^[12]	825 GiB^[10]	Apache 2.0	GPT-3-style language model
Ernie 3.0 Titan	December 2021	Baidu	260 billion^[13]^[14]	4 Tb	Proprietary	Chinese-language LLM. Ernie Bot is based on this model.
Claude^[15]	December 2021	Anthropic	52 billion^[16]	400 billion tokens^[16]	Template:Closed beta	fine-tuned for desirable behavior in conversations^[17]
GLaM (Generalist Language Model)	December 2021	Google	1.2 trillion^[18]	1.6 trillion tokens^[18]	Proprietary	sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3
LaMDA (Language Models for Dialog Applications)	January 2022	Google	137 billion^[19]	1.56T words^[19]	Proprietary	specialized for response generation in conversations
Megatron-Turing NLG	October 2021^[20]	Microsoft and Nvidia	530 billion^[21]	338.6 billion tokens^[21]	Restricted web access	standard architecture but trained on a supercomputing cluster
GPT-NeoX	February 2022	EleutherAI	20 billion^[22]	825 GiB^[10]	Apache 2.0	based on the Megatron architecture
Chinchilla	March 2022	DeepMind	70 billion^[23]	1.3 trillion tokens^[23]^[24]	Proprietary	reduced-parameter model trained on more data
PaLM (Pathways Language Model)	April 2022	Google	540 billion^[25]	768 billion tokens^[23]	Proprietary	aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer)	May 2022	Meta	175 billion^[26]	180 billion tokens^[27]	Template:Non-commercial research Template:Efn	GPT-3 architecture with some adaptations from Megatron
YaLM 100B	June 2022	Yandex	100 billion^[28]	1.7TB^[28]	Apache 2.0	English-Russian model
BLOOM	July 2022	Large collaboration led by Hugging Face	175 billion^[29]	350 billion tokens (1.6TB)^[30]	Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
AlexaTM (Teacher Models)	November 2022	Amazon	20 billion^[31]	1.3 trillion^[32]	Template:Public web API^[33]	bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI)	February 2023	Meta	65 billion^[34]	1.4 trillion^[34]	Template:Non-commercial research Template:Efn	trained on a large 20-language corpus to aim for better performance with fewer parameters.^[34]
GPT-4	March 2023	OpenAI	UnknownTemplate:Efn	Unknown	Template:Public web API	Available for ChatGPT Plus users. Microsoft confirmed that GPT-4 model is used in Bing Chat.^[35]

2023

(Zhao, Zhou et al., 2023) ⇒ Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. (2023). “A Survey of Large Language Models.” In: arXiv preprint arXiv:2303.18223. doi:10.48550/arXiv.2303.18223

2022

(Li, Tang et al., 2021) ⇒ Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. (2021). “Pretrained Language Models for Text Generation: A Survey.” arXiv:2105.10311 https://doi.org/10.48550/arXiv.2201.05273
- ABSTRACT: Text Generation aims to produce plausible and readable text in a human language from input data. The resurgence of deep learning has greatly advanced this field, in particular, with the help of neural generation models based on pre-trained language models (PLMs). Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation. We begin with introducing three key aspects of applying PLMs to text generation: 1) how to encode the input into representations preserving input semantics which can be fused into PLMs; 2) how to design an effective PLM to serve as the generation model; and 3) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties. Then, we show the major challenges arisen in these aspects, as well as possible solutions for them. We also include a summary of various useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further improve these PLMs for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn the core concepts, the main techniques and the latest developments in this area based on PLMs.

↑ ^{Jump up to: 1.0} ^1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper
↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022
↑ Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease
↑ "Better language models and their implications". https://openai.com/research/better-language-models.
↑ ^{Jump up to: 6.0} ^6.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). https://lambdalabs.com/blog/demystifying-gpt-3.
↑ "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog
↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
↑ ^{Jump up to: 10.0} ^10.1 ^10.2 Template:Cite arxiv
↑ ^{Jump up to: 11.0} ^11.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo
↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.
↑ Nast, Condé. "China's ChatGPT Black Market Is Thriving". https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu.
↑ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan et al. (December 23, 2021). ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv:2112.12731. http://arxiv.org/abs/2112.12731.
↑ "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.
↑ ^{Jump up to: 16.0} ^16.1 Template:Cite arxiv
↑ Template:Cite arxiv
↑ ^{Jump up to: 18.0} ^18.1 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog
↑ ^{Jump up to: 19.0} ^19.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog
↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
↑ ^{Jump up to: 21.0} ^21.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint
↑ Template:Cite conference
↑ ^{Jump up to: 23.0} ^23.1 ^23.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog
↑ Template:Cite arxiv
↑ Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog
↑ "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
↑ Template:Cite arxiv
↑ ^{Jump up to: 28.0} ^28.1 Template:Citation
↑ Cite error: Invalid <ref> tag; no text was provided for refs named bigger-better
↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
↑ "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
↑ Template:Cite arxiv
↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.
↑ ^{Jump up to: 34.0} ^34.1 ^34.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog
↑ Lardinois, Frederic (March 14, 2023). "Microsoft’s new Bing was using GPT-4 all along". https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/. Retrieved March 14, 2023.

[bert-paper-1] {Jump up to: 1.0} ^1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper

[bert-web-2] "BERT". March 13, 2023. https://github.com/google-research/bert.

[Manning-2022-3] Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022

[15Brelease-4] Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease

[5] "Better language models and their implications". https://openai.com/research/better-language-models.

[LambdaLabs-6] {Jump up to: 6.0} ^6.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). https://lambdalabs.com/blog/demystifying-gpt-3.

[7] "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.

[chatgpt-blog-8] Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog

[gpt-neo-9] "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.

[Pile-10] {Jump up to: 10.0} ^10.1 ^10.2 Template:Cite arxiv

[vb-gpt-neo-11] {Jump up to: 11.0} ^11.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo

[12] "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.

[13] Nast, Condé. "China's ChatGPT Black Market Is Thriving". https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu.

[14] Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan et al. (December 23, 2021). ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv:2112.12731. http://arxiv.org/abs/2112.12731.

[15] "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.

[AnthroArch-16] {Jump up to: 16.0} ^16.1 Template:Cite arxiv

[17] Template:Cite arxiv

[glam-blog-18] {Jump up to: 18.0} ^18.1 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog

[lamda-blog-19] {Jump up to: 19.0} ^19.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog

[20] Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.

[mtnlg-preprint-21] {Jump up to: 21.0} ^21.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint

[“gpt-neox-20b”-22] Template:Cite conference

[chinchilla-blog-23] {Jump up to: 23.0} ^23.1 ^23.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog

[24] Template:Cite arxiv

[palm-blog-25] Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog

[26] "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.

[27] Template:Cite arxiv

[:0-28] {Jump up to: 28.0} ^28.1 Template:Citation

[bigger-better-29] Cite error: Invalid <ref> tag; no text was provided for refs named bigger-better

[30] "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.

[31] "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.

[32] Template:Cite arxiv

[33] "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.

[llama-blog-34] {Jump up to: 34.0} ^34.1 ^34.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog

[35] Lardinois, Frederic (March 14, 2023). "Microsoft’s new Bing was using GPT-4 all along". https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/. Retrieved March 14, 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

Pretrained Large Language Model (LLM)

References

2023

2023

2022

Navigation menu

Search