Google DeepMind’s SIMA, a generalist AI gaming partner

Google DeepMind’s new Scalable Instructable Multiworld Agent (SIMA) is a cutting-edge AI that learns to play video games like a human. It’s designed to follow your commands, performing diverse tasks and adapting to various gaming scenarios.

Currently, SIMA is a research project and its training is not focused on winning games but on functioning as a personal assistant.

SIMA tries to overcome a key obstacle in AI: to connect the LLMs with their real-world applications. It is designed to interpret spoken commands and carry them out in simulated environments that mimic real-life actions.

Traditionally, AI agents can only play the game they were trained for and wouldn’t be able to transfer their skills to unfamiliar environments. SIMA, however, is built to be adaptable. It can learn general game concepts and follow your instructions, not just master one specific game.

Overview of SIMA (source: technical report)

The SIMA system can visually interpret its surroundings, understand instructions given in human language, and convert them into corresponding actions, such as movement (“jump”), navigation (“go to the HUB terminal”), resource gathering (“get raspberries”), or object management (“cut the potato”).

Model setup and architecture

SIMA uses only two data streams: natural language instructions provided by the user and visual observations continuously gathered from the environment. It uses the keyboard and mouse to follow your instructions and converts them into in-game actions (see the picture below).

SIMA agent architecture (source: technical report)

Dataset: SIMA employs a large dataset of gameplay from both specialized research environments (designed to isolate specific skills) and popular video games. To achieve a high level of adaptability, Google collaborated with eight game studios to train SIMA on nine different open-world games. These games, which range from “Goat Simulator” to “Wobbly Life,” were chosen for their open-ended interactions and diverse environments.

SIMA environments (source: technical report)

Training SIMA for diverse tasks

SIMA underwent a comprehensive training program. Below is an outline of the training methodology:

  1. Learning by doing (reinforcement learning): SIMA explored various game environments and learned through trial and error. By maximizing rewards within each game, it continuously improved its performance.
  2. Learning from the masters (imitation learning): SIMA didn’t just play; it observed the human players’ strategies and actions across different games.
  3. Building a foundation (self-supervised pretraining): By analyzing a vast amount of video data, it learned to recognize common elements in games, like objects and interactions. This foundation allowed it to adapt quickly to new games.

This combined approach empowered SIMA to understand and navigate diverse game worlds. It can even extrapolate its acquired knowledge to completely unfamiliar games, displaying an impressive aptitude for generalization across diverse environments.

The researchers created a special training environment within the Unity game engine.

  • Firstly, AI agents practiced tasks like building sculptures to test object manipulation.
  • Next, they recorded real people playing games in pairs to understand the link between language and actions. One player controlled the game, while the other provided instructions.
  • Finally, independent play by individual players provided insights into decision-making.

This comprehensive dataset (instructions, gameplay, and individual playthroughs) was integrated into the training of SIMA agents.

Instructions across SIMA data (source: technical report)

SIMA is trained to be a personal assistant

SIMA isn’t trained to win; it’s trained to assist. Tim Harley, a research engineer at DeepMind, emphasized their focus on creating an AI that can act as a “believable partner” rather than a “superhuman opponent”. This shift from competition to cooperation could lead to more enriching gaming experiences where players can command and collaborate with their AI companions in real-time.

Evaluation

The team tested SIMA across 600 basic skills within various games. They included navigation, object interaction, and menu usage, such as “turn left,” “climb the ladder,” and “open the map”. SIMA demonstrated proficiency in completing simple tasks within approximately 10 seconds, performing at a level comparable to human players.

The evaluation also focused on SIMA’s zero-shot capabilities, meaning it could transfer knowledge learned in one game to perform similar tasks in new games without specific training. This highlights its potential as a generalist AI agent. Performance metrics assessed accuracy, efficiency, adaptability to new scenarios, and comprehension of complex instructions.

SIMA evaluation results (source: technical report)

Additionally, researchers evaluated the quality of interaction within the virtual worlds. To illustrate the agent’s broad range of skills, the next figure shows a variety of tasks the agent can perform in different video game settings.

Agent Trajectories. The SIMA agent is capable of performing a range of language-instructed tasks across diverse 3D virtual environments (source: technical report)

Despite the visual differences between the game environments, SIMA can handle various tasks, including basic navigation (moving around) and using tools within the game. It can even locate specific objects – even if they’re not immediately visible – like spaceships, cars, or hubs.

The following image illustrates that SIMA can perform well even in environments it hasn’t been specifically trained for (zero-shot learning).

SIMA per-environment relative performance (source: technical report)

Can we use SIMA?

Currently SIMA is in the research phase and is not available for public access. Its future application may include:

  • gaming companion: users can interact with SIMA as a virtual gaming partner that understands and follows natural language instructions
  • creative assistant: it can assist in creative tasks within virtual environments, such as building structures or exploring new worlds based on user commands
  • research and development: serves as a platform to study how AI can learn from interactions

Conclusion

Google’s SIMA is a forward-looking vision for AI’s role in gaming and beyond. It brings a new era of human-computer interaction. By enabling AI to understand natural language and translate it to real-world actions, SIMA opens the way for truly interactive AI companions that can be integrated in our daily lives.

Learn more:

Other popular posts