Chinchilla LLM
A Chinchilla LLM is a Large Language Model that ...
References
2023
- (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Chinchilla_AI Retrieved:2023-7-14.
- Chinchilla is a family of large language models developed by the research team at DeepMind, presented in March of 2022. It is named "chinchilla" because it is a further development over a previous model family named "Gopher". Both model families were trained in order to investigate the scaling laws of large language models.
It claimed to outperform GPT-3.
It considerably simplifies downstream utilization because it requires much less computer power for inference and fine-tuning. Based on the training of previously employed language models, it has been determined that if one doubles the model size, one must also have twice the number of training tokens. This hypothesis has been used to train Chinchilla AI by DeepMind. Similar to Gopher in terms of cost, Chinchilla has 70B parameters and four times as much data.
Chinchilla has an average accuracy of 67.5% on the MMLU benchmark (Measuring Massive Multitask Language Understanding), which is 7% higher than Gopher’s performance. Chinchilla AI is still in the testing phase as of January 12, 2023. Chinchilla contributes to developing an effective training paradigm for large auto-regressive language models with limited compute resources. The Chinchilla team recommends that the number of training tokens is twice for every model size doubling, meaning that using larger, higher-quality training datasets can lead to better results on downstream tasks.
- Chinchilla is a family of large language models developed by the research team at DeepMind, presented in March of 2022. It is named "chinchilla" because it is a further development over a previous model family named "Gopher". Both model families were trained in order to investigate the scaling laws of large language models.
2022
- (Hoffmann et al., 2022) ⇒ Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. (2022). “An Empirical Analysis of Compute-optimal Large Language Model Training.” In: Advances in Neural Information Processing Systems, 35. doi:10.48550/arXiv.2203.15556