Gemma LLM Model
(Redirected from Gemma models)
Jump to navigation
Jump to search
A Gemma LLM Model is a large language model (LLM) developed by Google DeepMind.
- Context:
- It can (often)be built upon the research used to create the Gemini Models.
- ...
- It can range from being a 2B parameter model to a larger 27B parameter model, optimized for different hardware and inference needs.
- ...
- It can support Text Generation Tasks using a decoder-only transformer architecture, based on the "Attention Is All You Need" framework.
- It can support Code Generation Tasks through models like CodeGemma, which specializes in coding tasks such as completion and generation.
- ...
- Example(s):
- a CodeGemma model used for auto-completion and code suggestion, integrated into developer IDEs to assist in writing Python or JavaScript code.
- a PaliGemma model used in image captioning, where the model generates descriptive captions for images in applications like media analysis.
- ...
- Counter-Example(s):
- See: Large Language Models, Deep Learning, Transformer Models, Text Generation, Natural Language Processing, Gemini Models, CodeGemma, PaliGemma, RecurrentGemma
References
2024
- https://developers.googleblog.com/en/gemma-explained-overview-gemma-model-family-architectures/
- NOTES:
- Gemma is based on Gemini technology: Gemma models are derived from the same research and technological foundation used to create the Gemini models, providing state-of-the-art capabilities in various language tasks.
- Open-weight LLM: Gemma LLMs are open models available in both raw, pre-trained, and instruction-tuned variants, allowing users to explore and adapt the models for different tasks and modalities.
- Single-modality and multi-modality support: Gemma models can operate in single-modality (text input, text output) or multi-modality (text and image input, text output) configurations, enabling a range of use cases.
- Model variants with different parameter sizes: The Gemma family includes models of various sizes, such as 2B, 7B, 9B, and 27B parameters, designed to suit different hardware and computational needs.
- CodeGemma specialization: CodeGemma is a specialized variant of Gemma, optimized for coding tasks like code completion and generation, using a training dataset of over 500 billion tokens of code.
- RecurrentGemma for fast inference: RecurrentGemma models, built on the novel Griffin architecture, use a mixture of local attention and linear recurrences, making them efficient for generating long sequences.
- PaliGemma for vision-language tasks: PaliGemma is designed to handle vision-language tasks, such as image captioning, by taking in both text and image inputs and providing text-based outputs.
- Transformer decoder-only architecture: Unlike traditional encoder-decoder models, Gemma uses a decoder-only transformer architecture, enhancing its capabilities in text generation tasks.
- GeGLU activation function: Gemma replaces the standard ReLU activation function with the GeGLU (Gated Linear Unit), improving its performance on complex language tasks by using more advanced activation techniques.
- Fine-tunability: Gemma models, including CodeGemma, are highly adaptable and can be fine-tuned for specific tasks, making them versatile across various domains, from language generation to code assistance.
- NOTES:
2024
- https://arena.lmsys.org/ 2024-08-13
- NOTE: This table presents a comprehensive overview of AI language models, detailing their release dates, performance (as measured by Arena Score), and corresponding organizations. The models span various release periods, from February 2023 to August 2024, highlighting advancements in AI capabilities. The Arena Score is a benchmark metric reflecting the performance of each model based on a standardized testing framework. Release dates are critical as they provide context for technological progression and model improvements.
Rank* (UB) | Model Name | Arena Score | 95% CI | Votes | Organization | License | Knowledge Cutoff | Verified Release Date | References |
---|---|---|---|---|---|---|---|---|---|
19 | Gemma-2-27b-it | 1217 | +3/-3 | 28365 | Gemma license | 2024/6 | 2024-06-27 | [Google AI Blog](https://developers.googleblog.com), [Maginative](https://www.maginative.com) | |
31 | Gemma-2-9b-it | 1187 | +4/-4 | 25489 | Gemma license | 2024/6 | 2024-06-27 | [Google AI Blog](https://developers.googleblog.com), [Maginative](https://www.maginative.com) | |
75 | Gemma-1.1-7b-it | 1084 | +4/-4 | 25091 | Gemma license | 2024/2 | 2024-02 | [Google AI Blog](https://developers.googleblog.com) | |
114 | Gemma-2b-it | 990 | +9/-9 | 4921 | Gemma license | 2024/2 | 2024-02 | [Google AI Blog](https://developers.googleblog.com) |