Meta has revealed Llama 3, the latest advancement in its series of open-source large language models. This new version comes with the first two Llama 3 models that are both pre-trained and instruction-fine-tuned, featuring 8B and 70B parameters, respectively.
Building on its predecessor, Llama 3 has new capabilities such as improved reasoning and achieves state-of-the-art performance on industry benchmarks.
Meta provides access to the early phase of Llama 3 models and the community can contribute to their development. The objective is to create superior open models that rival the quality of today’s leading proprietary models. You can access the code here and read the official announcement.
The text-based models currently released represent the initial offerings within the broader Llama 3 suite. Future developments aim to improve Llama 3 with multilingual and multimodal capabilities, and have longer context windows.
State-of-the-art performance
Llama 3 showcases improved performance over its predecessor, Llama 2, with advancements in scalability and the ability to handle complex, multi-step tasks.
The models have undergone post-training refinement, resulting in a significant reduction in false refusal rates, better response alignment, and increased diversity in model answers. They prove superior capabilities in language nuances, contextual understanding, and tasks like translation and dialogue generation.
The next chart compares Llama 3 Instruct’s performance with other models like Claude Sonnet, Mistral Medium, and GPT-3.5 across these varied categories.

The next table shows the pre-trained model’s performance.

During the development of Llama 3, the focus was on assessing the model’s performance against standard benchmarks and optimizing it for real-world applications. A new, high-quality human evaluation set was created, featuring 1,800 prompts spanning 12 essential use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization.
The following chart depicts the results of human evaluations across various categories and prompts. The evaluation compared the performance of Meta Llama 3 to Claude Sonnet, Mistral Medium, and GPT-3.5.

Model architecture
Llama 3 is built on Llama 2’s architecture, with the following improvements:
- A tokenizer with a 128K token vocabulary for more efficient language encoding, resulting in better model performance.
- Grouped query attention (GQA) to boost inference efficiency in both the 8B and 70B model sizes.
- Training on sequences of 8,192 tokens, with a masking technique to prevent self-attention from spanning across document boundaries.
Training data
Llama 3’s training dataset is 7 times larger than those used for Llama 2 and includes 4 times more code. The new model was pre-trained on 15T tokens from public sources, including over 5% non-English data covering 30+ languages. Quality is ensured through advanced filtering pipelines and data mix optimization, promising robust performance across diverse use cases.
Scaling up pretraining
To make sure Llama 3 can learn more and get smarter, they used scaling laws, a set of rules that help the model learn better from all the information it has. These scaling laws also allowed for the prediction of performance on key tasks, such as code generation, before the actual training of the models. To further enhance the training process, Meta employed three types of parallelization: data, model, and pipeline.
Training was conducted on two custom-built 24K GPU clusters, with improvements in hardware reliability and new scalable storage systems, resulting in an effective training time of more than 95%. These advancements made the training of Llama 3 approximately three times more efficient compared to Llama 2.
Instruction fine-tuning
To enhance the chat capabilities of Llama 3, a new instruction fine-tuning method was employed, combining supervised fine-tuning, rejection sampling (keeping the good model’s responses and discarding the bad ones), proximal policy optimization (the model gets rewarded for generating good chat responses and slightly penalized for bad ones), and direct preference optimization (showing the model what’s preferred by humans).
Deploying Llama 3 at scale
Llama 3 is set to become widely accessible across various platforms, including cloud services and API providers. It achieves greater efficiency by using up to 15% fewer tokens than its predecessor, Llama 2, while maintaining the same speed despite its larger size.
Meta AI is accessible across various platforms, including Facebook, Instagram, WhatsApp, Messenger, and the meta.ai website. It enables you to do tasks, learn new things, create content, and engage with what’s important to you. Further details can be found here.
To access the models, go to the Llama 3 website and consult the Getting Started Guide for an updated list of supported platforms.
In the near future, users will have the opportunity to experience a new dimension of Meta AI through its multimodal integration with Ray-Ban Meta smart glasses.
Conclusion
Meta Llama 3 is a significant advancement in the field of LLMs. By offering a range of pre-trained models with varying parameter sizes and open-sourcing the technology, Meta has made significant strides in democratizing access to this powerful technology.
Learn more:
- Official website
- Code repository (GitHub)
- Models on Hugging Face
- Meta’s press release: “Introducing Meta Llama 3: The most capable openly available LLM to date”