Large Language Model (LLM)

From GM-RKB
(Redirected from LLM)
Jump to navigation Jump to search

A Large Language Model (LLM) is a neural language model that is a large neural model.



References

2023

List

Name Release dateTemplate:Efn Developer Number of parametersTemplate:Efn Corpus size Training cost (petaFLOP-day) LicenseTemplate:Efn Notes
BERT Template:Dts Google Template:Sort[1] Template:Sort words[1] Template:Sort[2] Apache 2.0[3] An early and influential language model,[4] but encoder-only and thus not built to be prompted or generative[5]
XLNet Template:Dts Google Template:Sort[6] Template:Sort words An alternative to BERT; designed as encoder-only[7][8]
GPT-2 Template:Dts OpenAI Template:Sort[9] 40GB[10] (~Template:Sort tokens)[11] MIT[12] general-purpose model based on transformer architecture
GPT-3 Template:Dts OpenAI Template:Sort[13] Template:Sort tokens[11] 3640[14] proprietary A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[15]
GPT-Neo Template:Dts EleutherAI Template:Sort[16] 825 GiB[17] MIT[18] The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[18]
GPT-J Template:Dts EleutherAI Template:Sort[19] 825 GiB[17] 200[20] Apache 2.0 GPT-3-style language model
Megatron-Turing NLG Template:Dts[21] Microsoft and Nvidia Template:Sort[22] Template:Sort tokens[22] Restricted web access Standard architecture but trained on a supercomputing cluster.
Ernie 3.0 Titan Template:Dts Baidu Template:Sort[23] 4 Tb Proprietary Chinese-language LLM. Ernie Bot is based on this model.
Claude[24] Template:Dts Anthropic Template:Sort[25] Template:Sort tokens[25] Template:Partial success Fine-tuned for desirable behavior in conversations.[26]
GLaM (Generalist Language Model) Template:Dts Google Template:Sort[27] Template:Sort tokens[27] 5600[27] Proprietary Sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher Template:Dts DeepMind Template:Sort[28] Template:Sort tokens[29] 5833[30] Proprietary
LaMDA (Language Models for Dialog Applications) Template:Dts Google Template:Sort[31] 1.56T words,[31] Template:Sort tokens[29] 4110[32] Proprietary Specialized for response generation in conversations.
GPT-NeoX Template:Dts EleutherAI Template:Sort[33] 825 GiB[17] 740[20] Apache 2.0 based on the Megatron architecture
Chinchilla Template:Dts DeepMind Template:Sort[34] Template:Sort tokens[34][29] 6805[30] Proprietary Reduced-parameter model trained on more data. Used in the Sparrow bot.
PaLM (Pathways Language Model) Template:Dts Google Template:Sort[35] Template:Sort tokens[34] 29250[30] Proprietary aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer) Template:Dts Meta Template:Sort[36] Template:Sort tokens[37] 310[20] Template:Partial successTemplate:Efn GPT-3 architecture with some adaptations from Megatron
YaLM 100B Template:Dts Yandex Template:Sort[38] 1.7TB[38] Apache 2.0 English-Russian model based on Microsoft's Megatron-LM.
Minerva Template:Dts Google Template:Sort[39] 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[39] Proprietary LLM trained for solving "mathematical and scientific questions using step-by-step reasoning".[40] Minerva is based on PaLM model, further trained on mathematical and scientific data.
BLOOM Template:Dts Large collaboration led by Hugging Face Template:Sort[41] Template:Sort tokens (1.6TB)[42] Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica Template:Dts Meta Template:Sort Template:Sort tokens[43] unknown Template:Partial success Trained on scientific text and modalities.
AlexaTM (Teacher Models) Template:Dts Amazon Template:Sort[44] Template:Sort[45] proprietary[46] bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI) Template:Dts Meta Template:Sort[47] Template:Sort[47] 6300[48] Template:Partial successTemplate:Efn Trained on a large 20-language corpus to aim for better performance with fewer parameters.[47] Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.[49]
GPT-4 Template:Dts OpenAI Exact number unknownTemplate:Efn Unknown Unknown proprietary Available for ChatGPT Plus users and used in several products.
Cerebras-GPT Template:Dts Cerebras Template:Sort[50] 270[20] Apache 2.0 Trained with Chinchilla formula.
Falcon Template:Dts Technology Innovation Institute Template:Sort[51] 1 trillion tokens, from RefinedWeb (filtered web text corpus)[52] plus some "curated corpora".[53] 2800[48] Apache 2.0[54] Training cost around 2700 petaFLOP-days, 75% that of GPT-3.
BloombergGPT Template:Dts Bloomberg L.P. Template:Sort 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[55] Proprietary LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
PanGu-Σ Template:Dts Huawei Template:Sort 329 billion tokens[56] Proprietary
OpenAssistant[57] Template:Dts LAION Template:Sort 1.5 trillion tokens Apache 2.0 Trained on crowdsourced open data
Jurassic-2[58] Template:Dts AI21 Labs Exact size unknown Unknown Proprietary Multilingual[59]
PaLM 2 (Pathways Language Model 2) Template:Dts Google Template:Sort[60] Template:Sort tokens[60] 85000[48] Proprietary Used in Bard chatbot.[61]
Llama 2 Template:Dts Meta Template:Sort[62] Template:Sort tokens[62] Template:Partial success Successor of LLaMA.

2022

2020


  1. 1.0 1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper
  2. Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models" (in en-US). https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/. Retrieved 2023-06-20. 
  3. "BERT". March 13, 2023. https://github.com/google-research/bert. 
  4. Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022
  5. Template:Cite arXiv
  6. "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". https://www.kdnuggets.com/bert-roberta-distilbert-xlnet-which-one-to-use.html. 
  7. Naik, Amit Raja (September 23, 2021). "Google Introduces New Architecture To Reduce Cost Of Transformers". https://analyticsindiamag.com/google-introduces-new-architecture-to-reduce-cost-of-transformers/. 
  8. Template:Cite arXiv
  9. Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease
  10. "Better language models and their implications". https://openai.com/research/better-language-models. 
  11. 11.0 11.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3. 
  12. "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023. 
  13. Cite error: Invalid <ref> tag; no text was provided for refs named Wiggers
  14. Table D.1 in Template:Cite arXiv
  15. Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog
  16. "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo. 
  17. 17.0 17.1 17.2 Template:Cite arXiv
  18. 18.0 18.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo
  19. "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28. 
  20. 20.0 20.1 20.2 20.3 Template:Cite arXiv
  21. Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/. 
  22. 22.0 22.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint
  23. Template:Cite arXiv
  24. "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023. 
  25. 25.0 25.1 Template:Cite arXiv
  26. Template:Cite arXiv
  27. 27.0 27.1 27.2 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog
  28. "Language modelling at scale: Gopher, ethical considerations, and retrieval" (in en). https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval. Retrieved 20 March 2023. 
  29. 29.0 29.1 29.2 Template:Cite arXiv
  30. 30.0 30.1 30.2 Table 20 of PaLM: Scaling Language Modeling with Pathways
  31. 31.0 31.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog
  32. Template:Cite arXiv
  33. Template:Cite conference
  34. 34.0 34.1 34.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog
  35. Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog
  36. "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/. 
  37. Template:Cite arXiv
  38. 38.0 38.1 Template:Citation
  39. 39.0 39.1 Template:Cite arXiv
  40. "Minerva: Solving Quantitative Reasoning Problems with Language Models" (in en). 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html. Retrieved 20 March 2023. 
  41. Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. Bibcode 2023Natur.615..202A. doi:10.1038/d41586-023-00641-w. PMID 36890378. https://www.nature.com/articles/d41586-023-00641-w. 
  42. "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom. 
  43. Template:Cite arXiv
  44. "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning. 
  45. Template:Cite arXiv
  46. "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023. 
  47. 47.0 47.1 47.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog
  48. 48.0 48.1 48.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon. Retrieved 2023-06-20. 
  49. "Stanford CRFM". https://crfm.stanford.edu/2023/03/13/alpaca.html. 
  50. Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/. 
  51. "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/. 
  52. Template:Cite arXiv
  53. "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b. Retrieved 2023-06-20. 
  54. UAE’s Falcon 40B, World’s Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free, 31 May 2023
  55. Template:Cite arXiv
  56. Template:Cite arXiv
  57. Template:Cite arXiv
  58. Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI" (in en-US). https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/. Retrieved 2023-07-24. 
  59. Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race" (in en-US). https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/. Retrieved 2023-07-24. 
  60. 60.0 60.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html. Retrieved 18 May 2023. 
  61. "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/. 
  62. 62.0 62.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model" (in en). 2023. https://ai.meta.com/llama/. Retrieved 2023-07-19.