Gopher Large Language Model (LLM)
A Gopher Large Language Model (LLM) is a neural language model developed by DeepMind.
- See: GPT-3, Chinchilla LLM.
References
2021
- https://fortune.com/2021/12/08/deepmind-gopher-nlp-ultra-large-language-model-beats-gpt-3/
- QUOTE: ... DeepMind’s language model, which it calls Gopher, was significantly more accurate than these existing ultra-large language models on many tasks, particularly answering questions about specialized subjects like science and the humanities, and equal or nearly equal to them in others, such as logical reasoning and mathematics, according to the data DeepMind published.
This was the case despite the fact that Gopher is smaller than some ultra-large language software. Gopher has some 280 billion different parameters, or variables that it can tune. That makes it larger than OpenAI’s GPT-3, which has 175 billion. But it is smaller than a system that Microsoft and Nivida collaborated on earlier this year, called Megatron, that has 535 billion, as well as ones constructed by Google, with 1.6 trillion parameters, and Alibaba, with 10 trillion. ...
- QUOTE: ... DeepMind’s language model, which it calls Gopher, was significantly more accurate than these existing ultra-large language models on many tasks, particularly answering questions about specialized subjects like science and the humanities, and equal or nearly equal to them in others, such as logical reasoning and mathematics, according to the data DeepMind published.
2021
- https://deepmind.com/blog/article/language-modelling-at-scale
- QUOTE: ... Today we are releasing three papers on language models that reflect this interdisciplinary approach. They include a detailed study of a 280 billion parameter transformer language model called Gopher, a study of ethical and social risks associated with large language models, and a paper investigating a new architecture with better training efficiency. ...
... Our research investigated the strengths and weaknesses of those different-sized models, highlighting areas where increasing the scale of a model continues to boost performance – for example, in areas like reading comprehension, fact-checking, and the identification of toxic language. We also surface results where model scale does not significantly improve results — for instance, in logical reasoning and common-sense tasks. ...
- QUOTE: ... Today we are releasing three papers on language models that reflect this interdisciplinary approach. They include a detailed study of a 280 billion parameter transformer language model called Gopher, a study of ethical and social risks associated with large language models, and a paper investigating a new architecture with better training efficiency. ...
2021
- (Borgeaud et al., 2021) ⇒ Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, and Laurent Sifre. (2021). “Improving Language Models by Retrieving from Trillions of Tokens.”