BloombergGPT LLM
Jump to navigation
Jump to search
A BloombergGPT LLM is a domain-specific LLM for a finance domain.
- See: BLOOM LLM.
References
2023
- "BloombergGPT: How We Built a 50 Billion Parameter Financial Language Model." Toronto Machine Learning Series (TMLS), 2023-06-13
- QUOTE: We will present BloombergGPT, a 50 billion parameter language model, purpose-built for finance and trained on a uniquely balanced mix of standard general-purpose datasets and a diverse array of financial documents from the Bloomberg archives. Building a large language model (LLM) is a costly and time-intensive endeavor. To reduce risk, we adhered closely to model designs and training strategies from recent successful models, such as OPT and BLOOM. Nevertheless, we faced numerous challenges during the training process, including loss spikes, unexpected parameter drifts, and performance plateaus.
In this talk, we will discuss these hurdles and our responses, which included a complete training restart after weeks of effort. Our persistence paid off: BloombergGPT ultimately outperformed existing models on financial tasks by significant margins, while maintaining competitive performance on general LLM benchmarks. We will also provide several examples illustrating how BloombergGPT stands apart from general-purpose models.
Our goal is to provide valuable insights into the specific challenges encountered when building LLMs and to offer guidance for those debating whether to embark on their own LLM journey, as well as for those who are already determined to do so.
- NOTES:
- Building a large language model requires making decisions about model code/architecture, datasets, and compute infrastructure. The Bloomberg team aimed to mitigate risk by largely copying an existing successful model (BigScience's Bloom) while focusing the additional data on the finance domain.
- They used a mix of public datasets like C4 dataset and Wikipedia snapshot as well as private Bloomberg financial data over 15 years amounting to over 400 billion tokens. The total dataset was 200x the size of English Wikipedia.
- Training very large models can be unstable. The Bloomberg team faced issues with loss curve flattening and exploding gradients, requiring debugging tricks like lowering learning rates. They hypothesized issues with layer normalization contributed.
- After adjustments, they trained a 50 billion parameter model for 42 days before instability returned, though the model performed well on evaluations. Takeaways included starting small and ramping models up in size to diagnose issues earlier.
- The model achieved state-of-the-art financial domain performance by training on a mix of general and domain-specific data, suggesting potential for domain-specific large language models.
- QUOTE: We will present BloombergGPT, a 50 billion parameter language model, purpose-built for finance and trained on a uniquely balanced mix of standard general-purpose datasets and a diverse array of financial documents from the Bloomberg archives. Building a large language model (LLM) is a costly and time-intensive endeavor. To reduce risk, we adhered closely to model designs and training strategies from recent successful models, such as OPT and BLOOM. Nevertheless, we faced numerous challenges during the training process, including loss spikes, unexpected parameter drifts, and performance plateaus.
2023
- (Wu, Irsoy et al., 2023) ⇒ Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. (2023). “BloombergGPT: A Large Language Model for Finance.” In: arXiv preprint arXiv:2303.17564. doi:10.48550/arXiv.2303.17564
- ABSTRACT: The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT.