Pretrained Large Neural Language Model (Pretrained LLM)
(Redirected from Pre-Trained Large Language Model)
Jump to navigation
Jump to search
A Pretrained Large Neural Language Model (Pretrained LLM) is a pretrained language model that is a large language model.
- Context:
- It can be an input to a In-Context Learning System.
- It can be an input to a LLM Fine-Tuning System.
- ...
- It can range from being a Pure Pretrained LLM to being a Finetuned LLM (such as an instruction-tuned LLM).
- ...
- Example(s):
- a General Purpose Pretrained LLM, such as GPT-4.
- a Domain-Specific Pretrained LLM, such as:
- a Pretrained Biomedical LLM (e.g. BioGPT) or a Pretrained Protein LLM.
- a Pretrained Software LLM, such as Codex LLM.
- a Pretrained Finance LLM, such as Bloomberg LLM.
- a Pretrained Legal LLM, such as [[]].
- a Proprietary Pretrained LLM, such as:
- …
- Counter-Example(s):
- See: Language Model Metamodel, LLM Architecture, ULMFiT.
References
2023
- (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Large_language_model#List_of_large_language_models Retrieved:2023-3-19.
Name | Release dateTemplate:Efn | Developer | Number of parametersTemplate:Efn | Corpus size | LicenseTemplate:Efn | Notes |
---|---|---|---|---|---|---|
BERT | 2018 | 340 million[1] | 3.3 billion words[1] | Apache 2.0[2] | early and influential language model[3] | |
GPT-2 | 2019 | OpenAI | 1.5 billion[4] | 40GB[5] (~10 billion tokens)[6] | MIT[7] | general-purpose model based on transformer architecture |
GPT-3 | 2020 | OpenAI | 175 billion | 499 billion tokens[6] | Template:Public web API | A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[8] |
GPT-Neo | March 2021 | EleutherAI | 2.7 billion[9] | 825 GiB[10] | MIT[11] | The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[11] |
GPT-J | June 2021 | EleutherAI | 6 billion[12] | 825 GiB[10] | Apache 2.0 | GPT-3-style language model |
Ernie 3.0 Titan | December 2021 | Baidu | 260 billion[13][14] | 4 Tb | Proprietary | Chinese-language LLM. Ernie Bot is based on this model. |
Claude[15] | December 2021 | Anthropic | 52 billion[16] | 400 billion tokens[16] | Template:Closed beta | fine-tuned for desirable behavior in conversations[17] |
GLaM (Generalist Language Model) | December 2021 | 1.2 trillion[18] | 1.6 trillion tokens[18] | Proprietary | sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3 | |
LaMDA (Language Models for Dialog Applications) | January 2022 | 137 billion[19] | 1.56T words[19] | Proprietary | specialized for response generation in conversations | |
Megatron-Turing NLG | October 2021[20] | Microsoft and Nvidia | 530 billion[21] | 338.6 billion tokens[21] | Restricted web access | standard architecture but trained on a supercomputing cluster |
GPT-NeoX | February 2022 | EleutherAI | 20 billion[22] | 825 GiB[10] | Apache 2.0 | based on the Megatron architecture |
Chinchilla | March 2022 | DeepMind | 70 billion[23] | 1.3 trillion tokens[23][24] | Proprietary | reduced-parameter model trained on more data |
PaLM (Pathways Language Model) | April 2022 | 540 billion[25] | 768 billion tokens[23] | Proprietary | aimed to reach the practical limits of model scale | |
OPT (Open Pretrained Transformer) | May 2022 | Meta | 175 billion[26] | 180 billion tokens[27] | Template:Non-commercial researchTemplate:Efn | GPT-3 architecture with some adaptations from Megatron |
YaLM 100B | June 2022 | Yandex | 100 billion[28] | 1.7TB[28] | Apache 2.0 | English-Russian model |
BLOOM | July 2022 | Large collaboration led by Hugging Face | 175 billion[29] | 350 billion tokens (1.6TB)[30] | Responsible AI | Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages) |
AlexaTM (Teacher Models) | November 2022 | Amazon | 20 billion[31] | 1.3 trillion[32] | Template:Public web API[33] | bidirectional sequence-to-sequence architecture |
LLaMA (Large Language Model Meta AI) | February 2023 | Meta | 65 billion[34] | 1.4 trillion[34] | Template:Non-commercial researchTemplate:Efn | trained on a large 20-language corpus to aim for better performance with fewer parameters.[34] |
GPT-4 | March 2023 | OpenAI | UnknownTemplate:Efn | Unknown | Template:Public web API | Available for ChatGPT Plus users. Microsoft confirmed that GPT-4 model is used in Bing Chat.[35] |
2023
- (Zhao, Zhou et al., 2023) ⇒ Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. (2023). “A Survey of Large Language Models.” In: arXiv preprint arXiv:2303.18223. doi:10.48550/arXiv.2303.18223
2022
- (Li, Tang et al., 2021) ⇒ Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. (2021). “Pretrained Language Models for Text Generation: A Survey.” arXiv:2105.10311 https://doi.org/10.48550/arXiv.2201.05273
- ABSTRACT: Text Generation aims to produce plausible and readable text in a human language from input data. The resurgence of deep learning has greatly advanced this field, in particular, with the help of neural generation models based on pre-trained language models (PLMs). Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation. We begin with introducing three key aspects of applying PLMs to text generation: 1) how to encode the input into representations preserving input semantics which can be fused into PLMs; 2) how to design an effective PLM to serve as the generation model; and 3) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties. Then, we show the major challenges arisen in these aspects, as well as possible solutions for them. We also include a summary of various useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further improve these PLMs for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn the core concepts, the main techniques and the latest developments in this area based on PLMs.
- ↑ 1.0 1.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedbert-paper
- ↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedManning-2022
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs named15Brelease
- ↑ "Better language models and their implications". https://openai.com/research/better-language-models.
- ↑ 6.0 6.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). https://lambdalabs.com/blog/demystifying-gpt-3.
- ↑ "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedchatgpt-blog
- ↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
- ↑ 10.0 10.1 10.2 Template:Cite arxiv
- ↑ 11.0 11.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedvb-gpt-neo
- ↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.
- ↑ Nast, Condé. "China's ChatGPT Black Market Is Thriving". https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu.
- ↑ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan et al. (December 23, 2021). ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv:2112.12731. http://arxiv.org/abs/2112.12731.
- ↑ "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.
- ↑ 16.0 16.1 Template:Cite arxiv
- ↑ Template:Cite arxiv
- ↑ 18.0 18.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedglam-blog
- ↑ 19.0 19.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedlamda-blog
- ↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
- ↑ 21.0 21.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedmtnlg-preprint
- ↑ Template:Cite conference
- ↑ 23.0 23.1 23.2 Cite error: Invalid
<ref>
tag; no text was provided for refs namedchinchilla-blog
- ↑ Template:Cite arxiv
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedpalm-blog
- ↑ "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
- ↑ Template:Cite arxiv
- ↑ 28.0 28.1 Template:Citation
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedbigger-better
- ↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
- ↑ "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
- ↑ Template:Cite arxiv
- ↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.
- ↑ 34.0 34.1 34.2 Cite error: Invalid
<ref>
tag; no text was provided for refs namedllama-blog
- ↑ Lardinois, Frederic (March 14, 2023). "Microsoft’s new Bing was using GPT-4 all along". https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/. Retrieved March 14, 2023.