2024 ManyShotInContextLearning
- (Agarwal, Singh et al., 2024) ⇒ Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Stephanie Chan, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, and Hugo Larochelle. (2024). “Many-Shot In-Context Learning.” https://doi.org/10.48550/arXiv.2404.11018
Subject Headings: Many-Shot In-Context Learning (ICL).
Notes
- Introduction and Advancement of Many-Shot ICL: The study explores the extension of in-context learning (ICL) with large language models (LLMs) using significantly more examples (many-shot ICL) compared to traditional few-shot settings. Many-shot ICL allows for clearer task specifications, reduced ambiguity in commands, and leads to significant performance gains across various tasks, demonstrating enhanced model versatility and adaptability.
- Development of New ICL Frameworks: The paper introduces two innovative ICL settings: Reinforced ICL, where model-generated rationales replace human-generated ones, and Unsupervised ICL, which removes rationales altogether, focusing solely on domain-specific prompts. Both approaches are found to be effective for complex reasoning tasks.
- Empirical Results and Task Analysis: Extensive testing shows that many-shot ICL significantly surpasses few-shot learning in overriding pre-training biases and performs effectively in high-dimensional functions and complex reasoning tasks. The paper demonstrates the potential of many-shot learning to adapt LLMs to new domains without the need for fine-tuning or specialization.
- Performance Gains and Limitations: The research highlights significant performance improvements, particularly in complex reasoning tasks. However, it also discusses the limitations related to the dependency on high-quality human-generated outputs and the variability of next-token prediction loss as a performance indicator.
- Insights on Learning Dynamics: The study finds that the order of examples can significantly influence many-shot ICL performance, suggesting challenges in model training consistency across different contexts and tasks. This highlights the need for careful prompt design in the many-shot setting.
- Analysis of Next-Token Prediction Loss: The paper reveals limitations of using next-token prediction loss as an indicator of downstream task performance in the many-shot setting, emphasizing the need for alternative evaluation metrics.
- Future Research Directions: The paper outlines potential areas for further research, including exploring many-shot ICL across various models, addressing performance degradations when excessively scaling the number of examples, and investigating new research directions to explain performance trends in the many-shot regime.
Cited By
Quotes
Abstract
Large Language Models (LLMs) excel at Few-Shot In-Context Learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, Many-Shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced ICL and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced ICL and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike Few-Shot Learning, Many-Shot Learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs. Our analysis also reveals the limitations of Next-Token Prediction Loss as an indicator of downstream ICL performance.
References
;