Meta Llama 3.1 LLM

A Meta Llama 3.1 LLM is a Llama 3 LLM that introduces some enhancements.

Context:
- ...
Example(s):
- Llama 3.1 405B: The flagship model with 405 billion parameters, designed for high-end applications.
- Llama 3.1 70B Instruct: A 70 billion parameter model tuned for following instructions precisely.
- Llama 3.1 8B: A smaller, more resource-efficient model with 8 billion parameters, suitable for less intensive tasks.
- Llama 3.1 8B Instruct: An instruction-tuned version of the 8B model, optimized for tasks requiring clear and concise responses.
- ...
Counter-Example(s):
- OpenAI LLM Models like GPT-4o (proprietary LLM).
- Google LLM Models such as Gemini LLM (proprietary LLM).
See: InstructGPT, BERT, Transformer Neural Networks.

References

2024

https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

2024

(AI@ Meta Llama Team, 2024) ⇒ AI@Meta Llama Team. (2024). “The Llama 3 Herd of Models.” In: Meta AI Research.
- NOTE: The paper introduces Llama 3, a set of foundation models supporting multilinguality, coding, reasoning, and tool usage. The largest model has 405B parameters and performs competitively with GPT-4. It includes multimodal capabilities but is still under development.
- NOTE: Llama 3.1 models support long context windows up to 128K tokens, enhancing their ability to handle extensive input sequences effectively.
- NOTE: The Llama 3 series includes models with varying parameters, such as 8B, 70B, and 405B, catering to different levels of computational needs and applications.
- NOTE: The Llama 3 models are pre-trained on a large-scale, high-quality dataset of 15T multilingual tokens, significantly improving over previous versions.
- NOTE: Llama 3 models utilize a dense Transformer architecture with enhancements like Grouped Query Attention (GQA) for improved inference speed and reduced memory usage.
- NOTE: The development of Llama 3 involved extensive empirical evaluations, demonstrating competitive performance with state-of-the-art models on tasks like coding, reasoning, and multilingual processing.
- NOTE: The Llama 3 models are part of Meta's initiative to release open-access, high-performance language models, aiming to foster innovation and responsible AI development in the research community.

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Meta Llama 3.1 LLM

References

2024

2024

Navigation menu

Search