Llama 3.1 from Meta, its most capable models to date

July 25, 2024

On July 23, 2024 Meta launched Llama 3.1, a collection of open-source foundation models that support multilinguality, coding, reasoning, and tool usage. Their largest model, Llama 3.1 405B, offers great flexibility and state-of-the-art capabilities, competing the best closed source models like GPT-4, GPT-4o, NVIDIA’s Nemotron 4 340B, and Claude 3.5 Sonnet.

Following the launch of Llama 3 on April 18, the Llama 3.1 series of models is now available for both research and commercial applications.

You can download and run it for free on your own computer, but many cloud platforms charge for access. Read more in the How to use Llama 3.1 section.

What’s new with Llama 3.1?

Llama 3.1 introduces several new features and improvements over its predecessors, including:

Expanded model sizes: It includes upgraded versions of the 8B and 70B models and the new 405B, which enables advanced applications, such as synthetic data generation and model distillation.
Multilingual and multimodal capabilities: The new models natively support 8 languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The multimodal capabilities (image recognition, video recognition, and speech understanding) are still under development.
Longer context length: With a context window of up to 128K tokens, Llama 3.1 surpasses Llama 2 (up to 4K tokens), GPT-3.5 (up to 8K tokens), and GPT-4 (up to 32K tokens).
Enhanced tool usage: The Llama 3.1 family can integrate with a variety of tools, including search engines, calculators, code execution environments.
Robust safety measures: Meta has incorporated new security and safety tools, including Llama Guard 3 and Prompt Guard to filter out malicious prompts and reduce the risk of harmful outputs.

The Llama 3 models

The Llama 3 “herd” of models is a comprehensive suite of AI tools that cater to a wide range of applications:

Base models: These are pre-trained models that serve as the foundation for various AI tasks.
Instruction-tuned models: Fine-tuned for specific tasks, these models offer enhanced performance and accuracy.
Llama Guard 3: A safeguard model that classifies inputs and outputs to detect unsafe content, ensuring the responsible use of AI.

Key capabilities

Question answering: Get answers to your questions in multiple languages.
Code generation: Write code, debug, and explain code snippets.
Complex reasoning: Solve complex problems and puzzles.
Tool use: Able to use external tools like calculators or search engines.
Multilingual abilities: Translate text, summarize, and generate text in multiple languages.

Model architecture

Llama 3.1 maintains a similar model architecture to Llama and Llama 2. Its notable improvements come mainly from the use of an enhanced data quality and diversity, along with a larger training scale.

Its training dataset was made of 15T tokens from public sources, with over 5% non-English data covering more than 30 languages. Additionally, the training scale was significantly expanded, using advanced filtering pipelines and data mix optimization to ensure robust performance across diverse use cases.

More details regarding the model’s design and training are shown in the figure below.

LLama 3 architecture and training (source: Meta AI Blog)

The model’s development followed 2 main stages:

Pre-training. Train the model on a massive dataset of text from various languages. This text is broken down into smaller units (tokens) and the model learns to predict the next token in a sequence, gradually understanding language structure and acquiring a vast amount of knowledge. The 405B model was trained on over 15T multilingual tokens (compared to 1.8T tokens for Llama 2) using more than 16 thousand H100 GPUs, allowing it to process text sequences of 128K tokens.
Post-training. Perform rejection sampling (generating multiple outputs and select the best output using a reward model), supervised finetuning (trained on a curated dataset with human-provided labels), and direct preference optimization (human raters to provide feedback on the model’s outputs). To increase the efficiency of rejection sampling, they adopted PagedAttention. Additionally, the model acquires new abilities, such as using tools, and demonstrates enhanced performance in areas like coding and logical reasoning. Safety features are also incorporated during this phase.

Model evaluations

The Llama 3.1 models were evaluated on a large number (over 150) of benchmark datasets covering various languages. This ensured a comprehensive assessment of their performance across different linguistic contexts. The results shows that smaller Llama 3.1 models are competitive with other models of similar size, both closed-source and open-source. The largest Llama 3.1 model performs on par with leading foundation models like GPT-4, GPT-4o, and Claude 3.5 Sonnet (see the picture below).

Performance of finetuned Llama 3 models on key benchmark evaluations.(source: The Llama 3.1 paper )

To evaluate Llama 3.1’s capabilities in real-world scenarios, human evaluations were conducted comparing it to other language models. The results of these evaluations for the 405B Llama 3.1 model are shown in the picture below.

Llama 3.1 405B human evaluation (source: Meta AI Blog)

How to use Llama 3.1

Direct download. If you need to fine-tune the model and create specialized applications or integrate it with your existing systems, follow the next steps:

Download the model weights and tokenizer from Meta’s official website (you have to request access to Llama models) or repositories, such as GitHub.
Set up a suitable environment (ensure you have a GPU for optimal performance and install necessary dependencies like PyTorch or TensorFlow).
Use Ollama to load the model into your preferred framework (e.g., PyTorch, Hugging Face Transformers).
Fine-tune the model on your own dataset and use it for your specific tasks.

Cloud-based platforms or APIs. You can use Llama 3.1 on different platforms offered by over 25 companies, including Amazon Web Services (AWS), NVIDIA, Databricks, Groq, Dell, Azure, Google Cloud, and Snowflake. You have many options:

Direct access: Meta AI offers direct access to the Llama 3.1 405B model for users in the US through their website or WhatsApp. For users outside the US, platforms like Hugging Chat provide access.
Cloud-based solutions: Companies like AWS, Azure, and Google Cloud offer cloud services to use Llama 3.1 for your projects.
Specialized platforms: Platforms like Groq and Databricks are optimized for running LLMs like Llama 3.1 models, including the 405B version.

Conclusion

The open-source release of Llama 3.1 marks a significant step in democratizing advanced AI technology, making powerful models more accessible to a wider audience.

However, training large-scale models like Llama 3.1 requires substantial financial resources. The costs associated with acquiring high-quality data, computational power, and specialized hardware can be significant.