Metacognitive reinforcement learning: do humans learn the way that machines do?

The researchers at the Max Planck Institute for Intelligent Systems, Stuttgart, Germany have developed a framework called metacognitive reinforcement learning (MRL) inspired by machine learning algorithms to study the mechanisms of the human decision-making process.

The approach combines the principles of reinforcement learning (learning through trial and error based on rewards and punishments) with metacognition (thinking about one’s own thinking).

The research team investigated the role of MRL in human decision-making, using a series of experiments in which participants were asked to perform a task that required them to make decisions based on incomplete information.

The findings indicated that the MRL mechanism largely corresponded with the people’s actions, including the far-sighted planning strategies and the capacity to regulate their planning efforts.

AI-generated image

ML models have made considerable progress in recent years and can now accomplish tasks that were once believed to be achievable only by human intelligence. They can rapidly recognize patterns and generate predictions from large datasets that would be unreasonably time-consuming for humans to analyze.

However, machines still lack some of the abilities that humans possess, such as creativity, empathy, and the ability to understand complex social dynamics. The human brain is highly efficient at decision-making and complex reasoning, despite its limited cognitive resources and relatively low energy consumption.

Researchers have found the brain’s remarkable performance could be explained by the meta-reasoning and MRL mechanisms.

Metacognitive learning

The model

To explore how individuals learn and adapt their cognitive strategies through trial and error, the research team generated a total of 86 MRL models, grouped into four families:

  1. non-learning suggests people who don’t show any signs of learning from their past experiences.
  2. mental habit suggests people’s inclination to execute a particular planning operation increases as they perform it more frequently in the past.
  3. Learned Value of Computation (LVOC) suggests that people decide the tasks based on their past experiences. For example, if a particular planning operation has led to successful outcomes in the past, people are more likely to assign a high value to it and use it again in similar situations. Similarly, if a planning operation requires a lot of computational resources but has not yielded successful outcomes in the past, people are less likely to use it in the future.
  4. REINFORCE is a metacognitive learning process by continually adjusting the planning strategies based on the feedback from executing those strategies.

Then they studied diverse aspects of metacognitive learning, such as the way people adapt to varying environmental structures and the extent to which they modify their planning levels, based on different costs and rewards.


According to the results, the REINFORCE learning mechanism showed a strong correlation with human behavior, including the ability to plan ahead and regulate their planning efforts.

The participants who were better at metacognitive monitoring could better adjust their decision-making strategies and achieves better outcomes. This suggests that MRL plays an important role in human decision-making, allowing us to continually improve our decision-making processes over time.

Furthermore, the researchers found that the brain regions involved in MRL are also involved in general reinforcement learning, suggesting that these two processes may be closely related.

This has implications for the development of AI systems that can learn and adapt in a more human-like way, by incorporating MRL into their decision-making processes.


The study provides further evidence for the importance of MRL in human decision-making, and suggests that this process may be an important source of inspiration for the development of more advanced AI systems.

Beyond MRL, social and emotional factors play a critical role in human learning, which is not easily captured by current ML approaches. Curiosity, creativity, and intuition are also essential components of human learning, and they often drive individuals to explore and learn new things.

Therefore, further research in cognitive science and AI is necessary to better replicate the complex cognitive processes involved in human reasoning.

Learn more:

Other popular posts