Google DeepMind introduces AlphaDev, a new deep reinforcement learning (DRL) agent that can discover and optimize fundamental algorithms such as sorting or hashing. AlphaDev can produce algorithms that are accurate and efficient by working at CPU instruction level and minimizing their latency. The AI-driven agent was able to create small sorting algorithms from scratch that outperform the best human-designed ones.

The algorithms discovered by the AlphaDev agent have been integrated into the standard sort function of the LLVM C++ library. The model can also apply to other domains, such as matrix multiplication and hashing and find efficient routines for them.

The research proves that AI can surpass the current human benchmark and is able to discover new and better algorithms for basic problems.

The picture below shows the AlphaDev agent architecture.

Many sorting algorithms have reached a stage where human experts cannot optimize them further, leading to a computational bottleneck.

To address this challenge, the team introduces a game called AssemblyGame with a single player, AlphaDev. The player is a DRL agent trained to find a better sorting routine using a series of low-level CPU instructions (assembly instructions).

To effectively play the game, the learning algorithm can incorporate both DRL and stochastic search optimization algorithms. The player must navigate through the assembly instructions to find a sorting algorithm that is both provably correct and fast.

This game is very difficult for two main reasons:

- The large search space of assembly instructions, comparable to highly complex games like chess and Go.
- The nature of the reward function, where a single incorrect instruction can invalidate the entire algorithm. A single incorrect assembly instruction has the potential to render the entire algorithm incorrect.

The goal is to train the learning agent (AlphaDev) to generate efficient and correct assembly algorithms by iteratively exploring different actions and receiving rewards based on their performance. The sorting algorithm should be correct and have low-latency (usually length and latency are highly correlated). Upon taking an action, the player receives a reward.

The reward consists of two components: algorithm correctness and latency. Algorithm correctness is evaluated by comparing the generated outputs with the expected outputs.

The game is played for a limited number of steps, after which it is terminated.

**Winning the game**corresponds to generating a correct and low-latency algorithm using assembly instructions.**Losing the game**occurs when the generated algorithm is incorrect or correct but inefficient.

## Main Components

AlphaDev consists of two core components: a learning algorithm and a representation function.

*AlphaDev’s learning algorithm* is an extension of *DeepMind’s AlphaZero*, a well-known DRL and a MonteCarlo tree search algorithm used to master the games of chess, shogi and go.

AlphaZero uses a neural network to guide a search process in order to solve complex games. In the case of AlphaDev, the learning algorithm based on AlphaZero is adapted to address the challenges of AssemblyGame and find optimal assembly programs for sorting.

*The representation function* in AlphaDev helps the agent to interpret and understand the assembly instructions. The primary representation function utilized in AlphaDev is based on Transformers.

## AlphaDev vs AlphaZero

- AlphaDev
*uses a single-player*game formulation instead of a two-player game formulation. - AlphaDev
*uses a different reward function*that takes into account the accuracy and latency of the algorithm at the CPU instruction level - AlphaDev
*uses a different action space*. It can copy an element from one position to another, or swap multiple elements at once (swap and copy moves). This helps it to find novel shortcuts and create better sorting algorithms.

## The approach

In their approach the authors focused on two kinds of small sort algorithm: **(1) the fixed sort** and **(2) the variable sort**. **Fixed sort algorithms** can only sort sequences of a fixed length (for example, sort 4 can only sort sequences of length 4), while **variable sort algorithms** can sort sequences of varying size.

AlphaDev successfully discovered new fixed and variable sort algorithms that outperform the state-of-the-art human benchmarks. These algorithms were *created from scratch* by the agent, *without any prior human-designed algorithms*.

The picture below shows 2 sorting algorithms for sequences of length 2, 3, or 4: a human benchmark algorithm (a) and the AlphaDev’s sorting algorithm (b).

AlphaDev’s sorting algorithm (b) is more efficient than the human benchmark algorithm for sequences of length 4, because it uses *a simplified sorting routine* that avoids unnecessary comparisons. It firstly calls sort 3, and then applied a simplified sort 4 routine to sort the fourth number.

## Results

The new sorting methods made sorting faster by **up to 70% for sequences with five elements** and **by around 1.7% for sequences with over 250,000 elements**.

The fixed sort solutions found by AlphaDev for sort 3, sort 4, and sort 5 have been integrated into the standard sort function in the LLVM C++ library. This library is widely used by millions of users, including universities and various international companies.

There are two algorithmic discoveries made by AlphaDev that contribute to more efficient sorting algorithms by reducing the number of assembly instructions: the **“AlphaDev swap move”** and the **“AlphaDev copy move”**. It identified cases where certain inputs can be copied directly to the corresponding outputs without any additional operations, resulting in a reduction of instructions (“copy move”).

The team compared the performance of AlphaDev to stochastic search optimization approaches, highlighting the advantages of the AlphaDev approach.

Furthermore, they apply AlphaDev *to other domains beyond sorting*, demonstrating the versatility and generality of the approach in discovering efficient algorithms in various problem domains.

## Conclusion

The use of AI, particularly DRL, has allowed for the discovery of new, high-performing algorithms that improve the fundamental sorting routines.

The AlphaDev agent, trained through DRL demonstrates the capability to discover sorting algorithms that outperform existing human benchmarks in terms of both correctness and latency. This AI agent can be used for further advancements in other domains as well.

## Learn more:

Research paper: “Faster sorting algorithms discovered using deep reinforcement learning” (on Nature)