DeepSeek-R1 revolutionizes the AI landscape

February 21, 2025

The Chinese AI startup DeepSeek has made a breakthrough in AI with the launch of its first reasoning model, DeepSeek-R1 (repo, paper). It rivals OpenAI-o1-1217 in performance while being much cheaper and more accessible. DeepSeek-R1 offers local running options with Ollama, is available on mobile, and focuses on cost-effectiveness.

DeepSeek-R1’s impressive performance in math and coding comes from using advanced techniques like Chain-of-Thought reasoning (CoT), reinforcement learning (RL), and a Mixture of Experts (MoE) architecture. The CoT method enables the model to explain its reasoning step-by-step, instead of just providing answers.

Additionally, its open-source nature and cost-efficiency make the model an appealing choice for many users. Based on the company’s information, the development of DeepSeek-R1 required an investment of approximately $5.58 million, while OpenAI’s models are estimated to have cost around $6 billion. Moreover, DeepSeek offers its services at much lower prices, with input costs around $0.55 per million tokens and output costs at $2.19 per million tokens. In comparison, OpenAI’s prices are higher, at $15 per million tokens for input and $60 per million tokens for output.

DeepSeek-R1 and its variants are open-source under the MIT license. The release includes DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models (1.5B, 7B, 8B, 14B, 32B, and 70B) derived from DeepSeek-R1.

All these models are available on GitHub, where you can find comprehensive documentation on installation, usage, and system integration.

For the web version, simply visit the DeepSeek-R1 website and sign in.
DeepSeek-R1 is also available on mobile devices and is currently a top download in global app stores. You can install the app from the Apple App Store or Google Play Store, then log in and run the model.
For enhanced privacy, you can run DeepSeek-R1 locally using Ollama.

DeepSeek-R1 can be further fine-tuned to improve its reasoning abilities in specialized areas, like answering questions about a particular topic or solving a certain type of problem. This capability lets you experiment with the model and adapt it to your specific tasks.

Model downloads

The DeepSeek-R1 GitHub repository provides detailed instructions for downloading the models.

You can download the DeepSeek-R1 models from the links below.

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek-R1-Zero	671B	37B	128K	🤗 HuggingFace
DeepSeek-R1	671B	37B	128K	🤗 HuggingFace

Table source: DeepSeek-R1 GitHub repository

Distilled, compact versions of DeepSeek-R1 are available at the links below.

Model	Base Model	Download
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-Math-7B	🤗 HuggingFace
DeepSeek-R1-Distill-Llama-8B	Llama-3.1-8B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	🤗 HuggingFace
DeepSeek-R1-Distill-Llama-70B	Llama-3.3-70B-Instruct	🤗 HuggingFace

Table source: DeepSeek-R1 GitHub repository

What is DeepSeek-R1-Zero?

Created before DeepSeek-R1, DeepSeek-R1-Zero is a reasoning AI model based on DeepSeek-V3-Base, which was launched on December 2024. All these models feature 671B parameters and a Mixture of Experts (MoE) architecture, which activates only the most relevant subsets of parameters for each task, optimizing performance and scalability.

DeepSeek-R1-Zero was trained exclusively through large-scale reinforcement learning (RL) without any supervised fine-tuning (SFT) as an initial step. The model develops its reasoning capabilities over time by learning through trial and error.

A key innovation in the model’s training is the application of Group Relative Policy Optimization (GRPO). This approach involves generating multiple outputs for each prompt and comparing them to determine their relative quality, thereby eliminating the need for a separate critic model.

The table below compares DeepSeek-R1-Zero with OpenAI-o1-mini and OpenAI-o1-0912 models across various reasoning benchmarks.

Model	AIME 2024		MATH-500	GPQA	LiveCode	CodeForces
	AIME 2024		MATH-500	Diamond	Bench	CodeForces
	pass@1	cons@64	pass@1	pass@1	pass@1	rating
OpenAI-o1-mini	63.6	80.0	90.0	60.0	53.8	1820
OpenAI-o1-0912	74.4	83.3	94.8	77.3	63.4	1843
DeepSeek-R1-Zero	71.0	86.7	95.9	73.3	50.0	1444

Comparison of DeepSeek-R1-Zero and OpenAI o1 models on reasoning-related
benchmarks (source: paper)

AIME accuracy of DeepSeek-R1-Zero during training (source: paper)

Despite its strong performance in various tasks, DeepSeek-R1-Zero often produces output that is hard to read and understand (poor readability). Additionally, it sometimes mixes different languages within its responses (language mixing).

What is DeepSeek-R1?

To address these issues and further enhance performance, DeepSeek introduced DeepSeek-R1, which incorporates a cold-start phase with SFT before applying RL. DeepSeek-R1 has an innovative training pipeline, comprising 4 distinct stages:

Cold start: DeepSeek-V3-Base is fine-tuned on a small amount of carefully curated training data for efficient and rapid learning. This stage primarily focuses on reasoning.
Reasoning-oriented RL: The model follows the same extensive RL process previously used for DeepSeek-R1-Zero.
Supervised fine-tuning: The resulting model checkpoint is used to generate data for a subsequent round of SFT. This data include examples from various domains (writing, role-playing, etc.) in addition to reasoning, broadening the model’s capabilities beyond just reasoning tasks.
A second RL stage for all scenarios: This stage enhances helpfulness, harmlessness, and reasoning abilities to better align the model with human preferences.

DeepSeek-R1 achieves performance comparable to OpenAI’s o1-1217 model across various tasks, including:

Reasoning: Excels in math problem-solving (AIME 2024, MATH-500), achieving near parity with OpenAI-o1-1217 and surpassing other models. It also shows expert-level coding skills (Codeforces Elo 2029) and improved performance in engineering-related tasks compared to DeepSeek-V3.

Knowledge: Outperforms DeepSeek-V3 significantly on knowledge benchmarks (MMLU, MMLU-Pro, GPQA Diamond), though slightly behind OpenAI-o1-1217. It also surpasses DeepSeek-V3 on factual queries (SimpleQA).

Other abilities, such as creative writing, general QA, editing, summarization, and long-context understanding, significantly better than DeepSeek-V3.

Benchmark performance of DeepSeek-R1 (source: paper)

What are the DeepSeek-R1-Distill models?

Distilled models are smaller versions of larger, pre-trained AI models, developed through a technique known as distillation. In this process, the knowledge from a large “teacher” model is passed on to a smaller “student” model. The main objective is to achieve performance close to that of the original model, while offering greater efficiency and ease of deployment.

Researchers distilled the reasoning capabilities of DeepSeek-R1 into six smaller, more efficient models. Among them, DeepSeek-R1-Distill-Qwen-1.5B surpassed GPT-4 and Claude-3.5-Sonnet on math benchmarks, achieving 28.9% accuracy on AIME and 83.9% on MATH. DeepSeek-R1-Distill-Qwen-7B outperforms larger models on AIME 2024. The 32B parameter version significantly surpasses other open-source models and rivals OpenAI’s o1-mini, achieving impressive scores on AIME 2024, MATH-500, and LiveCodeBench.

Conclusion

DeepSeek-R1 represents a significant advancement in LLMs, offering enhanced reasoning capabilities through the Chain-of-Thought reasoning, reinforcement learning, and a Mixture of Experts architecture.

It demonstrates that state-of-the-art AI performance can be achieved without the need for massive hardware investments, delivering comparable results with fewer resources.

Furthermore, its open-source nature under the MIT license breaks the barriers of proprietary AI models, making advanced AI technology more accessible to everyone.

Further remarks

DeepSeek’s performance has caught the attention of industry leaders. OpenAI CEO Sam Altman called the R1 model “impressive,” recognizing the startup’s innovative approach. However, he reiterated that OpenAI’s strategy remains centered on using greater computing power to achieve continuous and predictable gains in AI performance.

The release of DeepSeek-R1 has had an immediate and significant impact on the Apple’s App Store. it has become the most downloaded app, surpassing for a while other platforms, such as ChatGPT. Additionally, this new, cheaper model has caused a big drop in the stock prices of AI companies, like NVIDIA.

Its development is particularly remarkable given the constraints imposed by international sanctions. Despite restrictions on access to advanced semiconductors, DeepSeek successfully developed its model using older NVIDIA chips, overcoming these limitations.