DeepMind released LASER: a novel 3D generative model

February 8, 2023

DeepMind introduces Latent Set Representations for NeRF-VAE (LASER-NV), a new generative model for reconstructing 3D scenes from 2D images. LASER-NV addresses some limitations of the previous model, Neural Radiance Field (NeRF).

NeRF has been largely used for 3D scene reconstruction, object recognition, and camera localization. Its main shortcomings are the need to be trained on a large number of views with accurate camera information leading to high computational costs, and limitations in modeling complex scenes.

LASER-NV addressees these challenges, being more practical for real-world applications. It has a high modeling capacity, can produce high-quality views of 3D scenes and is able to generate realistic parts of the scene that are not visible in the input views.

This is achieved by maintaining consistency with the observations, so that the generated completions are coherent with the rest of the scene.

The Model

LASER-NV is based on “a set-valued latent representation modeled by normalizing flows”. This means that each point in the data is associated with a set (or multiple sets) of possible values, instead of a single value.

For higher rendering fidelity LASER-NV further incorporates a geometry-informed attention mechanism, taking into account the geometry of the scene when generating novel views.

For generating more accurate and consistent results, the new model focuses on specific parts of the scene and takes into account the relationships between different objects in the scene.

*Reconstructions of observed views and predictions of partially unobserved novel views*

Conclusions, future research

LASER-NV has been shown to outperform other models in generating high-quality novel views of 3D scenes.

This has been demonstrated through evaluations on two datasets: ShapeNet, a benchmark dataset for 3D shape analysis and synthesis, and a novel simulated City dataset, which has high uncertainty in the parts of the scene that are not visible in the input views.

The authors believe this work is an important step towards learning a generative model of real scenes. LASER-NV still inherits some drawbacks from existing models like “NeRF”, such as computational cost and the need for accurate camera information.

Future research could consider integrating object-centric structure and dynamics for downstream reasoning tasks.