IndoorSim-to-OutdoorReal: Google’s latest model for robot navigation in outdoor environments

IndoorSim-to-OutdoorReal (I2O) is a new method for teaching robots to navigate in outdoor environments without any prior outdoor experience.

The visual navigation system of the robot was solely trained in simulated indoor environments and successfully tested in real-world outdoor environments without any additional training or modifications.

Teaching a mobile robot to navigate through complex outdoor environments is a significant challenge as it requires the robot to accurately perceive its surroundings and identify viable paths, while avoiding obstacles and pedestrians.

IndoorSim-to-OutdoorReal: an overview of the model

The I2O method, introduced by Google, uses a deep reinforcement learning model to develop a visual navigation strategy in a simulated indoor environment and then to transfer this navigation policy to outdoor environments.

For successful I2O transfer policy, the team provided the robot with Context Maps, that are satellite maps or simply sketches made by humans with additional information about the surroundings (buildings, sloped grounds, roads).

The Context Maps serve as guidance for the robot regarding the path to follow and help it navigate over long distances in outdoor environments.

The I2O transfer policy was tested and evaluated using the Spot robot from Boston Dynamics. During the evaluation stage, the robot successfully navigated hundreds of meters in outdoor environments, while also overcoming obstacles that were not previously encountered during its indoor training.


The task used to evaluate the robot during the training was PointGoal Navigation, also known as PointNav.

The robot is placed in a new environment and must navigate to a specific goal location under certain constraints (a maximum number of steps it can take and velocity limits).

The episode is considered successful if the robot reaches within a certain distance of the goal location.


The authors used two 3D datasets, Habitat-Matterport 3D Dataset (HM3D) and Gibson Dataset that consist of over 1000 scans of real-world indoor environments, including homes and offices.

The Context Maps were based on freely accessible information from Google Maps. The maps were converted in top-down occupancy maps (2D grid representations of the environment seen from a bird’s-eye view) through human sketches, but they can also be generated automatically.

The authors trained the navigation policies using a reinforcement learning algorithm called Distributed Deep Proximal Policy Optimization (DD-PPO) using 2 architectures: No-Context PointNav and Context-Guided PointNav.

No-Context PointNav architecture

In the first stage, the team used deep reinforcement learning to train a PointNav navigation policy in a simulated environment with no Context Maps (see picture below).

The No-Context PointNav policy architecture

This No-Context training stage has the following pipeline:

The Multi-Layer Perceptron (MLP) processes the goal vector (1) byapplying a series of linear and nonlinear transformations to extract its relevant features.  

The Convolutional Neural Network (CNN) processes the depth image input from the robot’s camera (the distance to objects) and extracts the relevant features (2) that are important for navigation policy.

The processed goal vector (1) along with the relevant features (2) are fed into the Gated Recurrent Unit (GRU) network. The GRU processes the input sequences and encodes the relevant information.

The GRU’s output is then passed through the second MLP to generate the output of the policy, containing the desired linear and angular velocities for the robot to follow.

Context-Guided PointNav architecture

The second training step (see picture below) is context guided using the Context Maps.

It includes a special visual encoder (Resnet18) to handle the additional information from the Context Maps (for example, information about buildings and free space). 

The Context-Guided PointNav policy architecture


The study has found that the Context Maps are critical for successful navigation in novel environments, even though they may not be accurate or complete. They provide additional information for the robot to find a path to its goal without colliding with obstacles or requiring human intervention.

If the maps were significantly inaccurate (corrupted with 50% noise or entirely blank), the policy reverted to the behavior of a policy with no context, meaning the robot could no longer use them.

The I2O navigation policies were tested in novel environment containing different obstacles including buildings, cars, bushes that the robot hasn’t seen during the indoor training. The team used the Spot robot and the Context Map contained the trajectories.

The Spot robot was able to navigate hundreds of meters in the new environment and was able to successfully reach the goal location (see example below).

The route with the start and the goal location

Robot navigation using the Context Map. Source: Project Page

Robot navigation using the Context Map. Source: Project Page


The IndoorSim-to-OutdoorReal approach enables robots to navigate in outdoor environments with greater accuracy and robustness, using only indoor training data.

They could be used for critical real-world applications such as navigating through forests or rugged terrain to locate missing persons or monitor wildlife habitats.

Learn more:

Other popular posts