Pretrained Large Neural Language Model (Pretrained LLM)

From GM-RKB
(Redirected from pretrained LLM)
Jump to navigation Jump to search

A Pretrained Large Neural Language Model (Pretrained LLM) is a pretrained language model that is a large language model.



References

2023

List of large language models
Name Release dateTemplate:Efn Developer Number of parametersTemplate:Efn Corpus size LicenseTemplate:Efn Notes
BERT 2018 Google 340 million[1] 3.3 billion words[1] Apache 2.0[2] early and influential language model[3]
GPT-2 2019 OpenAI 1.5 billion[4] 40GB[5] (~10 billion tokens)[6] MIT[7] general-purpose model based on transformer architecture
GPT-3 2020 OpenAI 175 billion 499 billion tokens[6] Template:Public web API A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[8]
GPT-Neo March 2021 EleutherAI 2.7 billion[9] 825 GiB[10] MIT[11] The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[11]
GPT-J June 2021 EleutherAI 6 billion[12] 825 GiB[10] Apache 2.0 GPT-3-style language model
Ernie 3.0 Titan December 2021 Baidu 260 billion[13][14] 4 Tb Proprietary Chinese-language LLM. Ernie Bot is based on this model.
Claude[15] December 2021 Anthropic 52 billion[16] 400 billion tokens[16] Template:Closed beta fine-tuned for desirable behavior in conversations[17]
GLaM (Generalist Language Model) December 2021 Google 1.2 trillion[18] 1.6 trillion tokens[18] Proprietary sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3
LaMDA (Language Models for Dialog Applications) January 2022 Google 137 billion[19] 1.56T words[19] Proprietary specialized for response generation in conversations
Megatron-Turing NLG October 2021[20] Microsoft and Nvidia 530 billion[21] 338.6 billion tokens[21] Restricted web access standard architecture but trained on a supercomputing cluster
GPT-NeoX February 2022 EleutherAI 20 billion[22] 825 GiB[10] Apache 2.0 based on the Megatron architecture
Chinchilla March 2022 DeepMind 70 billion[23] 1.3 trillion tokens[23][24] Proprietary reduced-parameter model trained on more data
PaLM (Pathways Language Model) April 2022 Google 540 billion[25] 768 billion tokens[23] Proprietary aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer) May 2022 Meta 175 billion[26] 180 billion tokens[27] Template:Non-commercial researchTemplate:Efn GPT-3 architecture with some adaptations from Megatron
YaLM 100B June 2022 Yandex 100 billion[28] 1.7TB[28] Apache 2.0 English-Russian model
BLOOM July 2022 Large collaboration led by Hugging Face 175 billion[29] 350 billion tokens (1.6TB)[30] Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
AlexaTM (Teacher Models) November 2022 Amazon 20 billion[31] 1.3 trillion[32] Template:Public web API[33] bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI) February 2023 Meta 65 billion[34] 1.4 trillion[34] Template:Non-commercial researchTemplate:Efn trained on a large 20-language corpus to aim for better performance with fewer parameters.[34]
GPT-4 March 2023 OpenAI UnknownTemplate:Efn Unknown Template:Public web API Available for ChatGPT Plus users. Microsoft confirmed that GPT-4 model is used in Bing Chat.[35]

2023

2022

  • (Li, Tang et al., 2021) ⇒ Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. (2021). “Pretrained Language Models for Text Generation: A Survey.” arXiv:2105.10311 https://doi.org/10.48550/arXiv.2201.05273
    • ABSTRACT: Text Generation aims to produce plausible and readable text in a human language from input data. The resurgence of deep learning has greatly advanced this field, in particular, with the help of neural generation models based on pre-trained language models (PLMs). Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation. We begin with introducing three key aspects of applying PLMs to text generation: 1) how to encode the input into representations preserving input semantics which can be fused into PLMs; 2) how to design an effective PLM to serve as the generation model; and 3) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties. Then, we show the major challenges arisen in these aspects, as well as possible solutions for them. We also include a summary of various useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further improve these PLMs for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn the core concepts, the main techniques and the latest developments in this area based on PLMs.

  1. 1.0 1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper
  2. "BERT". March 13, 2023. https://github.com/google-research/bert. 
  3. Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022
  4. Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease
  5. "Better language models and their implications". https://openai.com/research/better-language-models. 
  6. 6.0 6.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). https://lambdalabs.com/blog/demystifying-gpt-3. 
  7. "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023. 
  8. Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog
  9. "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo. 
  10. 10.0 10.1 10.2 Template:Cite arxiv
  11. 11.0 11.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo
  12. "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28. 
  13. Nast, Condé. "China's ChatGPT Black Market Is Thriving". https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu. 
  14. Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan et al. (December 23, 2021). ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv:2112.12731. http://arxiv.org/abs/2112.12731. 
  15. "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023. 
  16. 16.0 16.1 Template:Cite arxiv
  17. Template:Cite arxiv
  18. 18.0 18.1 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog
  19. 19.0 19.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog
  20. Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/. 
  21. 21.0 21.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint
  22. Template:Cite conference
  23. 23.0 23.1 23.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog
  24. Template:Cite arxiv
  25. Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog
  26. "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/. 
  27. Template:Cite arxiv
  28. 28.0 28.1 Template:Citation
  29. Cite error: Invalid <ref> tag; no text was provided for refs named bigger-better
  30. "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom. 
  31. "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning. 
  32. Template:Cite arxiv
  33. "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023. 
  34. 34.0 34.1 34.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog
  35. Lardinois, Frederic (March 14, 2023). "Microsoft’s new Bing was using GPT-4 all along". https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/. Retrieved March 14, 2023.