2024 EfficientExplorationforLLMs
Jump to navigation
Jump to search
- (Dwaracherla et al., 2024) ⇒ Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, and Benjamin Van Roy. (2024). “Efficient Exploration for LLMs.” doi:10.48550/arXiv.2402.00396
Subject Headings: Epistemic Neural Network, Double Thompson Sampling, RLHF.
Notes
- It demonstrates the substantial benefits of efficient exploration in gathering human feedback to improve large language models.
- It compares passive exploration with several active exploration algorithms, highlighting the effectiveness of double Thompson sampling with an epistemic neural network.
- It utilizes the Anthropic datasets and Gemini language models, alongside a human feedback simulator, for its experimentation pipeline.
- It incorporates a reward model architecture that includes point estimates and epistemic neural networks to estimate uncertainty.
- It shows that active exploration significantly reduces the number of queries required to achieve high levels of performance.
- It validates the results with empirical data, demonstrating that efficient exploration can potentially accelerate achieving superhuman creativity by decades.
- It suggests future work in exploring more complex ENN architectures, multiturn dialog exploration, and tuning more of the LLM torso.
Cited By
Quotes
Abstract
We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2024 EfficientExplorationforLLMs | Vikranth Dwaracherla Seyed Mohammad Asghari Botao Hao Benjamin Van Roy | Efficient Exploration for LLMs | 10.48550/arXiv.2402.00396 | 2024 |