Large Neural Language Model (LLM)
(Redirected from large neural LM)
Jump to navigation
Jump to search
A Large Neural Language Model (LLM) is a neural language model that is a large neural model.
- Context:
- Model Input: LLM Input.
- Model Output: LLM Output.
- It can (typically) have over 10M LLM parameters.
- It can (typically) reference an LLM Training Dataset.
- It can (typically) reference an LLM Architecture, such as a GPT architecture.
- It can (often) have LLM Features.
- ...
- It can range from being a Base Pretrained LLM to being a Finetuned LLM to being a Reasoning LLM.
- It can range from being a All-Domain LLM to being a Domain-Specific LLM.
- It can range from being a Short-Context LLM (<=16K) to being a Long-Context LLM (>16K), depending on its LM context length.
- It can range from being a Closed-Source LLM to being an Open-Source LLM.
- It can range from being a Unilingual LLM to being a Multilingual LLM.
- It can range from being a Decoder-based LLM to being a Encoder-based LLM to being a Decoder-Encoder-based LLM.
- It can range from being a Historical LLM (such as GPT-2) to being a Current LLM (such as GPT-4) to being a Future LLM.
- ...
- It can belong to an LLM Model Family.
- It can be an input to an LLM Inference Task.
- It can be used by an LLM-based System (solving an LLM-based task).
- …
- Example(s):
- Transformer-based LLMs (such as transformer-based decode-only LLM), such as: ...
- a Made-in-USA LLM, such as:
- an OpenAI LLM, such as: GPT-2, GPT-3, GPT-3.5, GPT-4.
- an Anthropic Claude LLM (2022): Made-in-USA LLM by Anthropic. Understands conversation context.
- a Google LLM, such as: PaLM LLM, LaMDA LLM, Gemini LLM.
- GPT-J LLM (2021): Made-in-UK LLM by EleutherAI with 6 billion parameters. For fine-tuning.
- Wu Dao 2.0 LLM (2022): Made-in-China LLM by Beijing Academy of AI. 1.75 trillion parameter multimodal model.
- BLOOM LLM (????): Made-in-Israel LLM by Bar-Ilan University. Understands conversation context.
- HyperClova LLM (????): Made-in-South Korea LLM by Naver Corp. Understands conversation.
- Fugaku LLM (~2023): Made-in-Japan LLM by RIKEN. 1.4 trillion parameters. For fine-tuning.
- PaLM-E.
- HuggingFace LLM, such as: BLOOM, T5 LLM, GPT-Neo, BART, LLaMA.
- Google LLM, such as: LaMDA. PaLM, Gemini LLM.
- Meta LLM, such as: LLaMA, Galactic LLM.
- Gopher LM.
- RoBERTa.
- GPT-J: GPT-J-6B.
- BERT Model.
- a Software Code LLM, such as Codex LLM.
- ...
- Counter-Example(s):
- See: Multi-Lingual Neural Network-based Language Model (NLM), LLM-based Task.
References
2023
- (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Large_language_model#List Retrieved:2023-3-19.
List
Name | Release dateTemplate:Efn | Developer | Number of parametersTemplate:Efn | Corpus size | Training cost (petaFLOP-day) | LicenseTemplate:Efn | Notes |
---|---|---|---|---|---|---|---|
BERT | Template:Dts | Template:Sort[1] | Template:Sort words[1] | Template:Sort[2] | Apache 2.0[3] | An early and influential language model,[4] but encoder-only and thus not built to be prompted or generative[5] | |
XLNet | Template:Dts | Template:Sort[6] | Template:Sort words | An alternative to BERT; designed as encoder-only[7][8] | |||
GPT-2 | Template:Dts | OpenAI | Template:Sort[9] | 40GB[10] (~Template:Sort tokens)[11] | MIT[12] | general-purpose model based on transformer architecture | |
GPT-3 | Template:Dts | OpenAI | Template:Sort[13] | Template:Sort tokens[11] | 3640[14] | proprietary | A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[15] |
GPT-Neo | Template:Dts | EleutherAI | Template:Sort[16] | 825 GiB[17] | MIT[18] | The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[18] | |
GPT-J | Template:Dts | EleutherAI | Template:Sort[19] | 825 GiB[17] | 200[20] | Apache 2.0 | GPT-3-style language model |
Megatron-Turing NLG | Template:Dts[21] | Microsoft and Nvidia | Template:Sort[22] | Template:Sort tokens[22] | Restricted web access | Standard architecture but trained on a supercomputing cluster. | |
Ernie 3.0 Titan | Template:Dts | Baidu | Template:Sort[23] | 4 Tb | Proprietary | Chinese-language LLM. Ernie Bot is based on this model. | |
Claude[24] | Template:Dts | Anthropic | Template:Sort[25] | Template:Sort tokens[25] | Template:Partial success | Fine-tuned for desirable behavior in conversations.[26] | |
GLaM (Generalist Language Model) | Template:Dts | Template:Sort[27] | Template:Sort tokens[27] | 5600[27] | Proprietary | Sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3. | |
Gopher | Template:Dts | DeepMind | Template:Sort[28] | Template:Sort tokens[29] | 5833[30] | Proprietary | |
LaMDA (Language Models for Dialog Applications) | Template:Dts | Template:Sort[31] | 1.56T words,[31] Template:Sort tokens[29] | 4110[32] | Proprietary | Specialized for response generation in conversations. | |
GPT-NeoX | Template:Dts | EleutherAI | Template:Sort[33] | 825 GiB[17] | 740[20] | Apache 2.0 | based on the Megatron architecture |
Chinchilla | Template:Dts | DeepMind | Template:Sort[34] | Template:Sort tokens[34][29] | 6805[30] | Proprietary | Reduced-parameter model trained on more data. Used in the Sparrow bot. |
PaLM (Pathways Language Model) | Template:Dts | Template:Sort[35] | Template:Sort tokens[34] | 29250[30] | Proprietary | aimed to reach the practical limits of model scale | |
OPT (Open Pretrained Transformer) | Template:Dts | Meta | Template:Sort[36] | Template:Sort tokens[37] | 310[20] | Template:Partial successTemplate:Efn | GPT-3 architecture with some adaptations from Megatron |
YaLM 100B | Template:Dts | Yandex | Template:Sort[38] | 1.7TB[38] | Apache 2.0 | English-Russian model based on Microsoft's Megatron-LM. | |
Minerva | Template:Dts | Template:Sort[39] | 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[39] | Proprietary | LLM trained for solving "mathematical and scientific questions using step-by-step reasoning".[40] Minerva is based on PaLM model, further trained on mathematical and scientific data. | ||
BLOOM | Template:Dts | Large collaboration led by Hugging Face | Template:Sort[41] | Template:Sort tokens (1.6TB)[42] | Responsible AI | Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages) | |
Galactica | Template:Dts | Meta | Template:Sort | Template:Sort tokens[43] | unknown | Template:Partial success | Trained on scientific text and modalities. |
AlexaTM (Teacher Models) | Template:Dts | Amazon | Template:Sort[44] | Template:Sort[45] | proprietary[46] | bidirectional sequence-to-sequence architecture | |
LLaMA (Large Language Model Meta AI) | Template:Dts | Meta | Template:Sort[47] | Template:Sort[47] | 6300[48] | Template:Partial successTemplate:Efn | Trained on a large 20-language corpus to aim for better performance with fewer parameters.[47] Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.[49] |
GPT-4 | Template:Dts | OpenAI | Exact number unknownTemplate:Efn | Unknown | Unknown | proprietary | Available for ChatGPT Plus users and used in several products. |
Cerebras-GPT | Template:Dts | Cerebras | Template:Sort[50] | 270[20] | Apache 2.0 | Trained with Chinchilla formula. | |
Falcon | Template:Dts | Technology Innovation Institute | Template:Sort[51] | 1 trillion tokens, from RefinedWeb (filtered web text corpus)[52] plus some "curated corpora".[53] | 2800[48] | Apache 2.0[54] | Training cost around 2700 petaFLOP-days, 75% that of GPT-3. |
BloombergGPT | Template:Dts | Bloomberg L.P. | Template:Sort | 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[55] | Proprietary | LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks" | |
PanGu-Σ | Template:Dts | Huawei | Template:Sort | 329 billion tokens[56] | Proprietary | ||
OpenAssistant[57] | Template:Dts | LAION | Template:Sort | 1.5 trillion tokens | Apache 2.0 | Trained on crowdsourced open data | |
Jurassic-2[58] | Template:Dts | AI21 Labs | Exact size unknown | Unknown | Proprietary | Multilingual[59] | |
PaLM 2 (Pathways Language Model 2) | Template:Dts | Template:Sort[60] | Template:Sort tokens[60] | 85000[48] | Proprietary | Used in Bard chatbot.[61] | |
Llama 2 | Template:Dts | Meta | Template:Sort[62] | Template:Sort tokens[62] | Template:Partial success | Successor of LLaMA. |
2022
- (Zhou et al., 2022) ⇒ Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. (2022). “Large Language Models Are Human-level Prompt Engineers.” In: arXiv preprint arXiv:2211.01910.
- QUOTE: ... By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. ...
2020
- (Liu et al., 2020) ⇒ Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, and Jianfeng Gao. (2020). “Adversarial Training for Large Neural Language Models.” arXiv preprint arXiv:2004.08994
- QUOTE: … Pre-training a large neural language model such as BERT has proven effective to improve generalization performance in task-specific fine-tuning (Devlin et al.…
- ↑ 1.0 1.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedbert-paper
- ↑ Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models" (in en-US). https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/. Retrieved 2023-06-20.
- ↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedManning-2022
- ↑ Template:Cite arXiv
- ↑ "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". https://www.kdnuggets.com/bert-roberta-distilbert-xlnet-which-one-to-use.html.
- ↑ Naik, Amit Raja (September 23, 2021). "Google Introduces New Architecture To Reduce Cost Of Transformers". https://analyticsindiamag.com/google-introduces-new-architecture-to-reduce-cost-of-transformers/.
- ↑ Template:Cite arXiv
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs named15Brelease
- ↑ "Better language models and their implications". https://openai.com/research/better-language-models.
- ↑ 11.0 11.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.
- ↑ "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedWiggers
- ↑ Table D.1 in Template:Cite arXiv
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedchatgpt-blog
- ↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
- ↑ 17.0 17.1 17.2 Template:Cite arXiv
- ↑ 18.0 18.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedvb-gpt-neo
- ↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.
- ↑ 20.0 20.1 20.2 20.3 Template:Cite arXiv
- ↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
- ↑ 22.0 22.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedmtnlg-preprint
- ↑ Template:Cite arXiv
- ↑ "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.
- ↑ 25.0 25.1 Template:Cite arXiv
- ↑ Template:Cite arXiv
- ↑ 27.0 27.1 27.2 Cite error: Invalid
<ref>
tag; no text was provided for refs namedglam-blog
- ↑ "Language modelling at scale: Gopher, ethical considerations, and retrieval" (in en). https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval. Retrieved 20 March 2023.
- ↑ 29.0 29.1 29.2 Template:Cite arXiv
- ↑ 30.0 30.1 30.2 Table 20 of PaLM: Scaling Language Modeling with Pathways
- ↑ 31.0 31.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedlamda-blog
- ↑ Template:Cite arXiv
- ↑ Template:Cite conference
- ↑ 34.0 34.1 34.2 Cite error: Invalid
<ref>
tag; no text was provided for refs namedchinchilla-blog
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedpalm-blog
- ↑ "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
- ↑ Template:Cite arXiv
- ↑ 38.0 38.1 Template:Citation
- ↑ 39.0 39.1 Template:Cite arXiv
- ↑ "Minerva: Solving Quantitative Reasoning Problems with Language Models" (in en). 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html. Retrieved 20 March 2023.
- ↑ Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. Bibcode 2023Natur.615..202A. doi:10.1038/d41586-023-00641-w. PMID 36890378. https://www.nature.com/articles/d41586-023-00641-w.
- ↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
- ↑ Template:Cite arXiv
- ↑ "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
- ↑ Template:Cite arXiv
- ↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.
- ↑ 47.0 47.1 47.2 Cite error: Invalid
<ref>
tag; no text was provided for refs namedllama-blog
- ↑ 48.0 48.1 48.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon. Retrieved 2023-06-20.
- ↑ "Stanford CRFM". https://crfm.stanford.edu/2023/03/13/alpaca.html.
- ↑ Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.
- ↑ "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.
- ↑ Template:Cite arXiv
- ↑ "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b. Retrieved 2023-06-20.
- ↑ UAE’s Falcon 40B, World’s Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free, 31 May 2023
- ↑ Template:Cite arXiv
- ↑ Template:Cite arXiv
- ↑ Template:Cite arXiv
- ↑ Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI" (in en-US). https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/. Retrieved 2023-07-24.
- ↑ Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race" (in en-US). https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/. Retrieved 2023-07-24.
- ↑ 60.0 60.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html. Retrieved 18 May 2023.
- ↑ "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.
- ↑ 62.0 62.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model" (in en). 2023. https://ai.meta.com/llama/. Retrieved 2023-07-19.