Skip to content

Categories

Search

Reinforcement Learning

DeepSeekMath-V2, the AI that can check its own math reasoning

DeepSeekMath-V2 (paper, model) is a self-verifiable mathematical reasoning model developed by DeepSeek-AI. It…

December 8, 2025
Absolute Zero – AI training without any human data

Imagine an AI that improves without human-labeled data and curated datasets. Just pure…

May 17, 2025
SWE-RL enhances LLMs coding capabilities

Meta introduces SWE-RL, marking the first time reinforcement learning has been used to…

March 18, 2025
Google DeepMind’s SIMA, a generalist AI gaming partner

Google DeepMind’s new Scalable Instructable Multiworld Agent (SIMA) is a cutting-edge AI that…

March 18, 2024
TaskWeaver is a smart planning agent for data analytics

Microsoft’s TaskWeaver is a code-first framework for planning and executing complex data analytics…

December 20, 2023
Google launches Gemini, its most advanced AI model

On December 6, 2023, Google launched Gemini, a cutting-edge multimodal AI model that…

December 7, 2023
AlphaDev: how deep reinforcement learning can find better sorting algorithms than humans

Google DeepMind introduces AlphaDev, a new deep reinforcement learning (DRL) agent that can…

June 12, 2023
Stability AI launched StableVicuna, the first open-source chatbot based on human feedback

Stability AI has introduced StableVicuna, the first large-scale open-source chatbot that has been…

May 3, 2023
Metacognitive reinforcement learning: do humans learn the way that machines do?

The researchers at the Max Planck Institute for Intelligent Systems, Stuttgart, Germany have…

April 11, 2023
DeepMind’s latest AI research reduces energy usage for cooling buildings

A research team from DeepMind and Google proposes a new approach involving ML…

January 9, 2023

Connect

Follow us on Twitter

Follow us on LinkedIn

Join us on Reddit

Company

Guides

Stable Diffusion

CLIP architecture

Links

Link
Reddit
Twitter