Large Language Model (LLM)

A Large Language Model (LLM) is a neural language model that is a deep neural model.

Context:
- Model Input: LLM Input.
- Model Output: LLM Output.
- ...
- It can (typically) have over 10M LLM parameters.
- It can (typically) reference an LLM Training Dataset.
- It can (typically) reference an LLM Architecture, such as a GPT architecture or diffusion architecture.
- It can (typically) have LLM Capabilities, such as:
  - Natural Language Understanding through contextual processing.
  - Human-Like Text Generation through language modeling.
  - Few-Shot Learning for task adaptation.
  - Contextual Memory across conversation turns.
- ...
- It can (often) have LLM Features.
- It can (often) generate LLM Output through either sequential token prediction (in autoregressive LLMs) or through iterative refinement (in diffusion-based LLMs).
- It can (often) utilize either causal attention (in autoregressive LLMs) or bidirectional attention (in diffusion-based LLMs) for context processing.
- It can (often) trade off between inference speed and output coherence depending on its generation mechanism.
- ...
- It can range from being a Base Pretrained LLM to being a Finetuned LLM to being a Reasoning LLM.
- It can range from being a All-Domain LLM to being a Domain-Specific LLM.
- It can range from being a Short-Context LLM (<=16K) to being a Long-Context LLM (>16K), depending on its LM context length.
- It can range from being a Closed-Source LLM to being an Open-Source LLM.
- It can range from being a Unilingual LLM to being a Multilingual LLM.
- It can range from being a Autoregressive LLM to being a Diffusion-based LLM, depending on its text generation approach.
- It can range from being a Decoder-based LLM to being a Encoder-based LLM to being a Decoder-Encoder-based LLM.
- It can range from being a Historical LLM (such as GPT-2) to being a Current LLM (such as GPT-4) to being a Future LLM.
- ...
- It can have Generation Architectures including autoregressive generation for sequential token prediction or diffusion-based generation for iterative token refinement.
- It can belong to an LLM Model Family.
- It can be an input to an LLM Inference Task.
- It can be used by an LLM-based System (solving an LLM-based task).
- …
Example(s):
- LLM Architecture Types, such as:
  - Autoregressive LLMs, such as:
    - GPT-4 Model for sequential text generation with causal attention.
    - LLaMA 3 Model for unidirectional context processing in next-token prediction.
    - Claude 3 Model for left-to-right text completion with decoder-only architecture.
  - Diffusion-based LLMs, such as:
    - LLaDA Model for parallel token generation with bidirectional attention.
    - Mercury Model for high-speed inference using iterative refinement.
    - DEEM Model for multi-modal content generation through diffusion processes.
- Commercial LLMs, such as:
  - Major Tech Company LLMs, such as:
    - OpenAI LLMs, such as GPT-4 LLM.
    - Google LLMs, such as:
      - PaLM Models.
      - LaMDA Model for conversational AI.
      - Gemini Model for multimodal processing.
    - Meta LLMs, such as: LLaMA models.
      - OPT Model for research accessibility.
    - Anthropic LLMs, such as Claude Models.
    - Inception Labs LLMs, such as Mercury Models for diffusion-based processing.
  - Regional Tech Leader LLMs, such as:
    - Asia-Pacific LLMs, such as:
      - Wu Dao 2.0 Model: Made-in-China LLM by Beijing Academy of AI.
      - HyperClova Model: Made-in-South Korea LLM by Naver Corp.
      - Fugaku Model: Made-in-Japan LLM by RIKEN.
      - DeepSeek LLMs: Made-in-China LLM by DeepSeek.
    - European LLMs, such as:
      - BLOOM Model: Made-in-EU LLM by Hugging Face collaborative.
      - GPT-J Model: Made-in-UK LLM by EleutherAI.
- Research LLMs, such as:
  - Academic Institution LLMs, such as:
    - University Research LLMs, such as:
      - Stanford LLMs, such as Alpaca Model for instruction following.
      - Berkeley LLMs, such as Vicuna Model for chat applications.
  - Open Source LLMs, such as:
    - HuggingFace LLMs, such as:
      - BLOOM Model for multilingual processing.
      - T5 Model for transfer learning.
      - BART Model for sequence-to-sequence tasks.
    - EleutherAI LLMs, such as:
      - GPT-Neo Model for open research.
      - GPT-J Model for efficient training.
- Specialized LLMs, such as:
  - Domain-Specific LLMs, such as:
    - Code LLMs, such as:
      - Codex Model for software development.
      - StarCoder Model for programming assistance.
      - Mercury Coder Model for high-speed code generation with diffusion architecture.
    - Scientific LLMs, such as:
      - Galactica Model for scientific research.
      - PubMedGPT Model for biomedical analysis.
  - Architecture-Specific LLMs, such as:
    - Encoder-Only LLMs, such as:
      - BERT Model for bidirectional encoding.
      - RoBERTa Model for robust preprocessing.
    - Encoder-Decoder LLMs, such as:
      - T5 Model for text-to-text transfer.
      - BART Model for sequence transformation.
    - Hybrid Architecture LLMs, such as:
      - Block-wise Semi-Autoregressive Model for balanced coherence and speed.
      - Combined Diffusion-Autoregressive System for optimized generation.
- ...
Counter-Example(s):
- a Visual Prediction Model, which processes image data rather than language.
- a Multi-Modal Model, which requires multiple input modality types beyond just text.
- a Shallow Neural Language Model, which has insufficient parameter count to qualify as large.
- a Rule-Based Text Generator, which uses symbolic rules rather than neural architectures.
See: Multi-Lingual Neural Network-based Language Model (NLM), LLM-based Task, Diffusion-based Large Language Model, Autoregressive Language Model, Transformer Architecture.

References

2023

(Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Large_language_model#List Retrieved:2023-3-19.

List

Name	Release dateTemplate:Efn	Developer	Number of parametersTemplate:Efn	Corpus size	Training cost (petaFLOP-day)	LicenseTemplate:Efn	Notes
BERT	Template:Dts	Google	Template:Sort^[1]	Template:Sort words^[1]	Template:Sort^[2]	Apache 2.0^[3]	An early and influential language model,^[4] but encoder-only and thus not built to be prompted or generative^[5]
XLNet	Template:Dts	Google	Template:Sort^[6]	Template:Sort words			An alternative to BERT; designed as encoder-only^[7]^[8]
GPT-2	Template:Dts	OpenAI	Template:Sort^[9]	40GB^[10] (~Template:Sort tokens)^[11]		MIT^[12]	general-purpose model based on transformer architecture
GPT-3	Template:Dts	OpenAI	Template:Sort^[13]	Template:Sort tokens^[11]	3640^[14]	proprietary	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.^[15]
GPT-Neo	Template:Dts	EleutherAI	Template:Sort^[16]	825 GiB^[17]		MIT^[18]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[18]
GPT-J	Template:Dts	EleutherAI	Template:Sort^[19]	825 GiB^[17]	200^[20]	Apache 2.0	GPT-3-style language model
Megatron-Turing NLG	Template:Dts^[21]	Microsoft and Nvidia	Template:Sort^[22]	Template:Sort tokens^[22]		Restricted web access	Standard architecture but trained on a supercomputing cluster.
Ernie 3.0 Titan	Template:Dts	Baidu	Template:Sort^[23]	4 Tb		Proprietary	Chinese-language LLM. Ernie Bot is based on this model.
Claude^[24]	Template:Dts	Anthropic	Template:Sort^[25]	Template:Sort tokens^[25]		Template:Partial success	Fine-tuned for desirable behavior in conversations.^[26]
GLaM (Generalist Language Model)	Template:Dts	Google	Template:Sort^[27]	Template:Sort tokens^[27]	5600^[27]	Proprietary	Sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher	Template:Dts	DeepMind	Template:Sort^[28]	Template:Sort tokens^[29]	5833^[30]	Proprietary
LaMDA (Language Models for Dialog Applications)	Template:Dts	Google	Template:Sort^[31]	1.56T words,^[31] Template:Sort tokens^[29]	4110^[32]	Proprietary	Specialized for response generation in conversations.
GPT-NeoX	Template:Dts	EleutherAI	Template:Sort^[33]	825 GiB^[17]	740^[20]	Apache 2.0	based on the Megatron architecture
Chinchilla	Template:Dts	DeepMind	Template:Sort^[34]	Template:Sort tokens^[34]^[29]	6805^[30]	Proprietary	Reduced-parameter model trained on more data. Used in the Sparrow bot.
PaLM (Pathways Language Model)	Template:Dts	Google	Template:Sort^[35]	Template:Sort tokens^[34]	29250^[30]	Proprietary	aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer)	Template:Dts	Meta	Template:Sort^[36]	Template:Sort tokens^[37]	310^[20]	Template:Partial success Template:Efn	GPT-3 architecture with some adaptations from Megatron
YaLM 100B	Template:Dts	Yandex	Template:Sort^[38]	1.7TB^[38]		Apache 2.0	English-Russian model based on Microsoft's Megatron-LM.
Minerva	Template:Dts	Google	Template:Sort^[39]	38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server^[39]		Proprietary	LLM trained for solving "mathematical and scientific questions using step-by-step reasoning".^[40] Minerva is based on PaLM model, further trained on mathematical and scientific data.
BLOOM	Template:Dts	Large collaboration led by Hugging Face	Template:Sort^[41]	Template:Sort tokens (1.6TB)^[42]		Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica	Template:Dts	Meta	Template:Sort	Template:Sort tokens^[43]	unknown	Template:Partial success	Trained on scientific text and modalities.
AlexaTM (Teacher Models)	Template:Dts	Amazon	Template:Sort^[44]	Template:Sort^[45]		proprietary^[46]	bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI)	Template:Dts	Meta	Template:Sort^[47]	Template:Sort^[47]	6300^[48]	Template:Partial success Template:Efn	Trained on a large 20-language corpus to aim for better performance with fewer parameters.^[47] Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.^[49]
GPT-4	Template:Dts	OpenAI	Exact number unknownTemplate:Efn	Unknown	Unknown	proprietary	Available for ChatGPT Plus users and used in several products.
Cerebras-GPT	Template:Dts	Cerebras	Template:Sort^[50]		270^[20]	Apache 2.0	Trained with Chinchilla formula.
Falcon	Template:Dts	Technology Innovation Institute	Template:Sort^[51]	1 trillion tokens, from RefinedWeb (filtered web text corpus)^[52] plus some "curated corpora".^[53]	2800^[48]	Apache 2.0^[54]	Training cost around 2700 petaFLOP-days, 75% that of GPT-3.
BloombergGPT	Template:Dts	Bloomberg L.P.	Template:Sort	363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets^[55]		Proprietary	LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
PanGu-Σ	Template:Dts	Huawei	Template:Sort	329 billion tokens^[56]		Proprietary
OpenAssistant^[57]	Template:Dts	LAION	Template:Sort	1.5 trillion tokens		Apache 2.0	Trained on crowdsourced open data
Jurassic-2^[58]	Template:Dts	AI21 Labs	Exact size unknown	Unknown		Proprietary	Multilingual^[59]
PaLM 2 (Pathways Language Model 2)	Template:Dts	Google	Template:Sort^[60]	Template:Sort tokens^[60]	85000^[48]	Proprietary	Used in Bard chatbot.^[61]
Llama 2	Template:Dts	Meta	Template:Sort^[62]	Template:Sort tokens^[62]		Template:Partial success	Successor of LLaMA.

2022

(Zhou et al., 2022) ⇒ Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. (2022). “Large Language Models Are Human-level Prompt Engineers.” In: arXiv preprint arXiv:2211.01910.
- QUOTE: ... By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. ...

2020

(Liu et al., 2020) ⇒ Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, and Jianfeng Gao. (2020). “Adversarial Training for Large Neural Language Models.” arXiv preprint arXiv:2004.08994
- QUOTE: … Pre-training a large neural language model such as BERT has proven effective to improve generalization performance in task-specific fine-tuning (Devlin et al.…

↑ ^{Jump up to: 1.0} ^1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper
↑ Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models" (in en-US). https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/. Retrieved 2023-06-20.
↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022
↑ Template:Cite arXiv
↑ "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". https://www.kdnuggets.com/bert-roberta-distilbert-xlnet-which-one-to-use.html.
↑ Naik, Amit Raja (September 23, 2021). "Google Introduces New Architecture To Reduce Cost Of Transformers". https://analyticsindiamag.com/google-introduces-new-architecture-to-reduce-cost-of-transformers/.
↑ Template:Cite arXiv
↑ Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease
↑ "Better language models and their implications". https://openai.com/research/better-language-models.
↑ ^{Jump up to: 11.0} ^11.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.
↑ "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named Wiggers
↑ Table D.1 in Template:Cite arXiv
↑ Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog
↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
↑ ^{Jump up to: 17.0} ^17.1 ^17.2 Template:Cite arXiv
↑ ^{Jump up to: 18.0} ^18.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo
↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.
↑ ^{Jump up to: 20.0} ^20.1 ^20.2 ^20.3 Template:Cite arXiv
↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
↑ ^{Jump up to: 22.0} ^22.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint
↑ Template:Cite arXiv
↑ "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.
↑ ^{Jump up to: 25.0} ^25.1 Template:Cite arXiv
↑ Template:Cite arXiv
↑ ^{Jump up to: 27.0} ^27.1 ^27.2 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog
↑ "Language modelling at scale: Gopher, ethical considerations, and retrieval" (in en). https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval. Retrieved 20 March 2023.
↑ ^{Jump up to: 29.0} ^29.1 ^29.2 Template:Cite arXiv
↑ ^{Jump up to: 30.0} ^30.1 ^30.2 Table 20 of PaLM: Scaling Language Modeling with Pathways
↑ ^{Jump up to: 31.0} ^31.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog
↑ Template:Cite arXiv
↑ Template:Cite conference
↑ ^{Jump up to: 34.0} ^34.1 ^34.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog
↑ Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog
↑ "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
↑ Template:Cite arXiv
↑ ^{Jump up to: 38.0} ^38.1 Template:Citation
↑ ^{Jump up to: 39.0} ^39.1 Template:Cite arXiv
↑ "Minerva: Solving Quantitative Reasoning Problems with Language Models" (in en). 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html. Retrieved 20 March 2023.
↑ Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. Bibcode 2023Natur.615..202A. doi:10.1038/d41586-023-00641-w. PMID 36890378. https://www.nature.com/articles/d41586-023-00641-w.
↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
↑ Template:Cite arXiv
↑ "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
↑ Template:Cite arXiv
↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.
↑ ^{Jump up to: 47.0} ^47.1 ^47.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog
↑ ^{Jump up to: 48.0} ^48.1 ^48.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon. Retrieved 2023-06-20.
↑ "Stanford CRFM". https://crfm.stanford.edu/2023/03/13/alpaca.html.
↑ Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.
↑ "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.
↑ Template:Cite arXiv
↑ "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b. Retrieved 2023-06-20.
↑ UAE’s Falcon 40B, World’s Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free, 31 May 2023
↑ Template:Cite arXiv
↑ Template:Cite arXiv
↑ Template:Cite arXiv
↑ Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI" (in en-US). https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/. Retrieved 2023-07-24.
↑ Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race" (in en-US). https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/. Retrieved 2023-07-24.
↑ ^{Jump up to: 60.0} ^60.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html. Retrieved 18 May 2023.
↑ "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.
↑ ^{Jump up to: 62.0} ^62.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model" (in en). 2023. https://ai.meta.com/llama/. Retrieved 2023-07-19.

[bert-paper-1] {Jump up to: 1.0} ^1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper

[bHZJ2-2] Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models" (in en-US). https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/. Retrieved 2023-06-20.

[bert-web-3] "BERT". March 13, 2023. https://github.com/google-research/bert.

[Manning-2022-4] Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022

[Ir545-5] Template:Cite arXiv

[45rAm-6] "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". https://www.kdnuggets.com/bert-roberta-distilbert-xlnet-which-one-to-use.html.

[gAbNO-7] Naik, Amit Raja (September 23, 2021). "Google Introduces New Architecture To Reduce Cost Of Transformers". https://analyticsindiamag.com/google-introduces-new-architecture-to-reduce-cost-of-transformers/.

[LX3rI-8] Template:Cite arXiv

[15Brelease-9] Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease

[5T8u5-10] "Better language models and their implications". https://openai.com/research/better-language-models.

[LambdaLabs-11] {Jump up to: 11.0} ^11.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.

[Sudbe-12] "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.

[Wiggers-13] Cite error: Invalid <ref> tag; no text was provided for refs named Wiggers

[:2-14] Table D.1 in Template:Cite arXiv

[chatgpt-blog-15] Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog

[gpt-neo-16] "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.

[Pile-17] {Jump up to: 17.0} ^17.1 ^17.2 Template:Cite arXiv

[vb-gpt-neo-18] {Jump up to: 18.0} ^18.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo

[JxohJ-19] "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.

[:3-20] {Jump up to: 20.0} ^20.1 ^20.2 ^20.3 Template:Cite arXiv

[BwnW5-21] Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.

[mtnlg-preprint-22] {Jump up to: 22.0} ^22.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint

[qeOB8-23] Template:Cite arXiv

[i8jc4-24] "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.

[AnthroArch-25] {Jump up to: 25.0} ^25.1 Template:Cite arXiv

[RZqhw-26] Template:Cite arXiv

[glam-blog-27] {Jump up to: 27.0} ^27.1 ^27.2 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog

[mD5eE-28] "Language modelling at scale: Gopher, ethical considerations, and retrieval" (in en). https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval. Retrieved 20 March 2023.

[hoffman-29] {Jump up to: 29.0} ^29.1 ^29.2 Template:Cite arXiv

[:4-30] {Jump up to: 30.0} ^30.1 ^30.2 Table 20 of PaLM: Scaling Language Modeling with Pathways

[lamda-blog-31] {Jump up to: 31.0} ^31.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog

[DMs9Z-32] Template:Cite arXiv

[gpt-neox-20b-33] Template:Cite conference

[chinchilla-blog-34] {Jump up to: 34.0} ^34.1 ^34.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog

[palm-blog-35] Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog

[jlof8-36] "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.

[QjTIc-37] Template:Cite arXiv

[yalm-repo-38] {Jump up to: 38.0} ^38.1 Template:Citation

[minerva-paper-39] {Jump up to: 39.0} ^39.1 Template:Cite arXiv

[FfCNK-40] "Minerva: Solving Quantitative Reasoning Problems with Language Models" (in en). 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html. Retrieved 20 March 2023.

[bigger-better-41] Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. Bibcode 2023Natur.615..202A. doi:10.1038/d41586-023-00641-w. PMID 36890378. https://www.nature.com/articles/d41586-023-00641-w.

[B8wB2-42] "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.

[37sY6-43] Template:Cite arXiv

[u5szh-44] "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.

[HaA7l-45] Template:Cite arXiv

[rpehM-46] "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.

[llama-blog-47] {Jump up to: 47.0} ^47.1 ^47.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog

[:5-48] {Jump up to: 48.0} ^48.1 ^48.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon. Retrieved 2023-06-20.

[KBedq-49] "Stanford CRFM". https://crfm.stanford.edu/2023/03/13/alpaca.html.

[D0k2a-50] Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.

[falcon-51] "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.

[Xb1gq-52] Template:Cite arXiv

[gzTNw-53] "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b. Retrieved 2023-06-20.

[Wmlcs-54] UAE’s Falcon 40B, World’s Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free, 31 May 2023

[nGOSu-55] Template:Cite arXiv

[9WSFw-56] Template:Cite arXiv

[JiOl8-57] Template:Cite arXiv

[58] Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI" (in en-US). https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/. Retrieved 2023-07-24.

[59] Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race" (in en-US). https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/. Retrieved 2023-07-24.

[cnbc-20230516-60] {Jump up to: 60.0} ^60.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html. Retrieved 18 May 2023.

[pWyLA-61] "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.

[meta-20230719-62] {Jump up to: 62.0} ^62.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model" (in en). 2023. https://ai.meta.com/llama/. Retrieved 2023-07-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

Large Language Model (LLM)

References

2023

List

2022

2020

Navigation menu

Search