Meta Llama 3.1 LLM
(Redirected from Llama 3.1 LLM Model)
Jump to navigation
Jump to search
A Meta Llama 3.1 LLM is a Llama 3 LLM that introduces some enhancements.
- Context:
- ...
- Example(s):
- Llama 3.1 405B: The flagship model with 405 billion parameters, designed for high-end applications.
- Llama 3.1 70B Instruct: A 70 billion parameter model tuned for following instructions precisely.
- Llama 3.1 8B: A smaller, more resource-efficient model with 8 billion parameters, suitable for less intensive tasks.
- Llama 3.1 8B Instruct: An instruction-tuned version of the 8B model, optimized for tasks requiring clear and concise responses.
- ...
- Counter-Example(s):
- OpenAI LLM Models like GPT-4o (proprietary LLM).
- Google LLM Models such as Gemini LLM (proprietary LLM).
- See: InstructGPT, BERT, Transformer Neural Networks.
References
2024
2024
- (AI@ Meta Llama Team, 2024) ⇒ AI@Meta Llama Team. (2024). “The Llama 3 Herd of Models.” In: Meta AI Research.
- NOTE: The paper introduces Llama 3, a set of foundation models supporting multilinguality, coding, reasoning, and tool usage. The largest model has 405B parameters and performs competitively with GPT-4. It includes multimodal capabilities but is still under development.
- NOTE: Llama 3.1 models support long context windows up to 128K tokens, enhancing their ability to handle extensive input sequences effectively.
- NOTE: The Llama 3 series includes models with varying parameters, such as 8B, 70B, and 405B, catering to different levels of computational needs and applications.
- NOTE: The Llama 3 models are pre-trained on a large-scale, high-quality dataset of 15T multilingual tokens, significantly improving over previous versions.
- NOTE: Llama 3 models utilize a dense Transformer architecture with enhancements like Grouped Query Attention (GQA) for improved inference speed and reduced memory usage.
- NOTE: The development of Llama 3 involved extensive empirical evaluations, demonstrating competitive performance with state-of-the-art models on tasks like coding, reasoning, and multilingual processing.
- NOTE: The Llama 3 models are part of Meta's initiative to release open-access, high-performance language models, aiming to foster innovation and responsible AI development in the research community.
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.