Introducing TinyLlama, an open-source compact language model

TinyLlama is a small but powerful language model with only 1.1B parameters that was trained on around 1 trillion tokens for about 3 epochs. It is built on the architecture and tokenizer of Llama 2 and can be easily integrated and used in many existing projects that are compatible with Llama 2.

Despite its small size, TinyLlama outperforms similar open-source models in various tasks and it is highly efficient, requiring less computational resources than larger models, such as Llama 2 with 7B to 70B parameters and GPT-4 with 175B parameters.

TinyLlama is open-source and uses the Apache License 2.0. The source code and the model weights are available at this link. You can also test the chat functionality here.

The model

TinyLlama is based on the architecture and tokenizer of Llama 2, but with several enhancements that improve its computational efficiency and quality:

  • They used the Fully Sharded Data Parallel (FSDP) technique, which splits the model parameters, gradients, and optimizer states across different devices. This facilitates the expansion of training across multiple nodes, leading to a faster and more efficient training process.
  • TinyLlama uses FlashAttention, a method designed to reduce the memory and computation cost associated with the attention mechanism in transformers. This implementation allows TinyLlama to process longer sequences and use larger batch sizes.
  • They replaced the fused SwiGLU module with the original SwiGLU module to decrease the memory footprint, i.e., the amount of memory required by the model to store its parameters.

The model was pretrained on a balanced mixture of natural language data and code data in a proportion of about 7:3.


TinyLlama was tested on various tasks that require commonsense reasoning and problem-solving skills and benchmarked against OPT-1.3B, Pythia-1.0B, and Pythia-1.4B. The following tables show that TinyLlama outperforms baselines on many of the tasks, such as problem-solving skills across various domains, math reasoning skills, and programming capabilities.

Zero-shot performance on commonsense reasoning tasks (paper)
MMLU 5-shotBBH 3-shotHumanEval 0-shotDROP 3-shotAvg.
Performance of problem-solving tasks on the InstructEval Benchmark (paper)

Potential use case

Tiny but powerful language models are valuable tools for a wide range of applications. Here are some examples:

  • Assisting speculative decoding of larger models. Tiny models can be used to generate initial predictions that larger models can then refine, making the overall process more efficient.
  • Deployment on devices with limited memory and computation resources. Tiny models are lightweight and can be run on resource-constrained devices, such as smartphones and tablets. This makes them ideal for applications that require real-time translation or dialogue generation.
  • Enabling real-time dialogue generation in video games.


TinyLlama shows that small language models, pretrained on a large and diverse corpus data, can achieve competitive results with larger models.

Learn more:

Other popular posts