Large Language Model (LLM) Emergent Property
Jump to navigation
Jump to search
An Large Language Model (LLM) Emergent Property is an emergent property for LLMs that manifests as new capabilities or improvements in performance not evident in smaller-scale models, often appearing abruptly as the model size increases.
- Context:
- It can (typically) involve LLM Tasks that smaller models struggle with.
- It can (often) be influenced by the choice of LLM Evaluation Metric, with nonlinear or discontinuous metrics potentially exaggerating the appearance of emergent properties.
- It can (often) be observed across different LLM Architectures.
- It can provoke discussions on the optimal ways to measure and understand model performance, emphasizing the need for metrics that accurately reflect incremental improvements.
- ...
- Example(s):
- The transition from GPT-2 to GPT-3 demonstrated emergent abilities in generating coherent long texts, showcasing qualitative differences in language generation capabilities.
- In vision tasks, altering the evaluation metrics can induce perceptions of emergent abilities, showing how metric choice can affect interpretations of model capability.
- ...
- Counter-Example(s):
- Small LLMs that exhibit high performance on specific tasks due to extensive fine-tuning or domain-specific training, challenging the notion that emergent properties are exclusive to large models.
- in Particle Swarm Optimization.
- in Cellular Automata.
- See: Emergent System, Model Scaling, LLM Evaluation Measure.
References
2023
- (Schaeffer et al., 2023) ⇒ Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo. (2023). “Are Emergent Abilities of Large Language Models a Mirage?.” doi:10.48550/arXiv.2304.15004
- It proposes a new perspective on emergent abilities in large language models (LLMs), arguing that these abilities may not be inherent to the models but rather a consequence of the metrics chosen for their evaluation.
- It offers an alternative explanation for emergent phenomena, suggesting the role of nonlinear or discontinuous metrics in creating the illusion of sudden capabilities as models scale up.
- It utilizes the InstructGPT/GPT-3 models in empirical analysis, demonstrating that switching from nonlinear or discontinuous metrics to linear or continuous ones unveils a smooth, predictable performance curve, challenging existing beliefs about emergent abilities.
- It conducts a comprehensive meta-analysis, scrutinizing claims of emergent abilities across a spectrum of tasks and metrics to reveal that these phenomena primarily emerge under specific, often nonlinear or discontinuous, evaluation metrics.
- It demonstrates through vision task experiments how changing metrics can induce the appearance of emergent abilities in various model architectures, emphasizing the profound impact of metric selection on perceived model capabilities.
2023
- https://windowsontheory.org/2023/12/22/emergent-abilities-and-grokking-fundamental-mirage-or-both/
- NOTES
- Emergent abilities in large language models: As models get bigger, they can suddenly gain new capabilities not seen in smaller models. For example, GPT-2 could generate coherent long text while GPT could not.
- Scale and unpredictable jumps in performance: As models scale up, their performance often jumps sharply at some point in an unpredictable way. For example, models may go from trivial performance to perfect accuracy on a task as compute is increased.
- Are emergent abilities a mirage? Recent work has shown these jumps can disappear if we change the evaluation metric to a softer one. For example, instead of binary accuracy, using edit distance as the metric can show more gradual progress.
- Analogous to high jump example: An athlete's max jump height increases smoothly with training, but probability of clearing a fixed bar jumps sharply at some point.
- Sharp transitions remain for complex tasks: Even if components improve smoothly, the probability of succeeding at multiple steps sequentially can still transition sharply from low to high.
- Main conclusion: Emergent abilities are likely not a complete mirage, since many real-world tasks require multiple steps of reasoning where failing at one step derails you. So sharp unpredictable transitions in capabilities are likely to remain.
- NOTES
2023
- (Shinn et al., 2023) ⇒ Noah Shinn, Beck Labash, and Ashwin Gopinath. (2023). “Reflexion: An Autonomous Agent with Dynamic Memory and Self-reflection.” doi:10.48550/arXiv.2303.11366
- QUOTE: ... To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.
2022
- (Wei, Tay et al., 2022) ⇒ Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus.. (2022). “Emergent Abilities of Large Language Models.” In: Transactions on Machine Learning Research, 08/2022 (TMLR).
- QUOTE: ... Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models. ...