Gemma 3 matches 98% DeepSeek-R1 and runs on a single GPU or TPU

March 26, 2025

Gemma 3, Google’s latest AI model, offers multi-modal capabilities and achieves 98% of DeepSeek-R1’s performance while running efficiently on a single GPU or TPU.

Key points

Optimized to run efficiently on a single GPU or TPU, with its largest model (27B parameters) outperforming Llama3-405B, DeepSeek-V3, and o3-mini and achieving 98% of DeepSeek-R1 in early human preference tests on LMArena‘s leaderboard.
Available in four sizes – 1B, 4B, 12B, and 27B – the smallest model is text-only, while the larger three are multi-modal, capable of processing text, images, and short videos.
128k-token context window for large data processing – except for the 1B model, which offers a 32K-token context window – and support for over 140 languages. These features make the models suitable for global, high-capacity applications that are able to communicate in the customers’ preferred language.

The key developments in Gemma 3 are focused on image understanding, long context, improved multilinguality, and enhanced STEM capabilities.

The next chart compares the Chatbot Arena Elo scores and GPU requirements of different models. While other high-ranking AI models require up to 32 NVIDIA H100 GPUs, Gemma 3 27B achieves high user preference scores using only one GPU. For instance, Gemma 3 27B’s Elo score of 1338 is approximately 98% of DeepSeek-R1‘s score of 1363.

AI models ranked by Chatbot Arena Elo scores and GPU usage (source: Google blog)

Gemma 3, in line with its prior version, Gemma 2, is device-friendly, enabling developers to integrate its functionalities directly into smartphones, laptops, and workstations. Google describes Gemma 3 collection as its most advanced, portable, and ethically developed open models to date designed to enhance its Gemini series.

The image below illustrates Gemma 3’s performance improvements over its predecessor, Gemma 2, as measured by the MMLU-Pro benchmark. This benchmark evaluates the range of knowledge and problem-solving skills that LLMs develop during their pretraining phase. For more information about Gemma 3’s advancements over Gemma 2, visit the official Google AI for developers’ blog.

Gemma 3 performance on MMLU-Pro benchmark (source)

In conjunction with Gemma 3, Google introduced ShieldGemma 2, an AI-based safety classifier designed to filter explicit, dangerous, or violent content in images.

To foster innovation and research, Google has launched the Gemma 3 Academic Program, offering academic researchers $10,000 in Google Cloud credits. You can apply here.

How to use Gemma 3

Instant access: You can try Gemma 3 in your browser through Google AI Studio.
Customize and build: Download it from Hugging Face, Ollama, or Kaggle.
Deploy and scale: Gemma 3 supports various deployment options and integrates with numerous tools, including Hugging Face Transformers, Ollama, JAX, Keras, PyTorch, Google AI Edge, UnSloth, vLLM, and Gemma.cpp.

You can create AI-driven workflows through function calling, which allows interaction with predefined functions or APIs. Based on user input, the model calls these functions to perform specific tasks – such as querying databases or sending emails – and then generates a structured response to complete the task. This integration makes the model a powerful tool for automating processes and building interactive systems.

With Gemma 3, you can also develop agent-like systems capable of interpreting user instructions, executing tasks, and providing meaningful responses. For example, you could automate customer service responses, manage schedules, or even initiate specific operations based on user interactions.

For detailed information on using Gemma 3, please visit the official Gemma 3 Developer Guide.

Conclusion

Google’s most advanced open models, Gemma 3, offer a combination of high performance, easy portability, and responsible AI innovation. Optimized for speed, they run efficiently on devices using just a single GPU or TPU and offer multi-modal capabilities, including text, image, and short video processing.

With support for over 140 languages and a 128k-token context window (32k-token for the smallest model), Gemma 3 models are well-suited for a wide range of applications.