SMERF is an AI tool for real-time rendering of large scenes

December 26, 2023

SMERF is a new fast and high-quality method for creating realistic 3D images of big scenes. It provides 6 degrees of freedom (6DOF) navigation and real-time rendering within the web browser on standard smartphones and laptops.

SMERF (Streamable Memory Efficient Radiance Fields) outperforms state-of-the-art methods in terms of both image quality and performance.

SMERF uses MERF (Memory Efficient Radiance Fields) and introduces two innovative techniques to improve its performance and effectiveness. It divides the scene into smaller parts (submodels) and only renders the submodel that is near the camera. This reduces the rendering overhead. It also learns from a better but slower model, called the teacher. This makes it more accurate and efficient.

You can try SMERF on the project page.

The model

SMERF has 3 main stages:

1. It divides the scene into smaller, independent submodels, and only renders the submodel that is near the camera. This reduces the rendering overhead (see the picture below).

SMERF divides the scene into smaller pieces called submodels (source: paper)

2. It further divides each submodel into smaller pieces called partitions, and uses a separate set of parameters for each partition. This way, the model can capture more details and variations in the scene.

3. It uses a technique called feature gating to decide which partitions to use and which to ignore. Feature gating is a way of turning on/off certain features based on some criteria. In this case, the criterion is the camera origin, which is the position of the camera in the scene.

At any point in time, a single submodel is used to render a target view, thus reducing the compute and memory requirements.

Training

Traditional NeRF-like models undergo extensive training from scratch, starting with random parameters and optimizing them to reconstruct images with minimal errors. However, this process can be computationally expensive and time-consuming.

In contrast, SMERF employs a novel approach called “student/teacher” distillation, where “the student” imitates “the teacher”, which is an already-trained high-quality model (see the picture below). Rather than starting from scratch, SMERF was trained to imitate Zip-NeRF.

This technique offers several advantages:

knowledge transfer: SMERF effectively extracts the knowledge and biases embedded within the “teacher” model, allowing it to rapidly achieve higher accuracy and performance.
reduced manual intervention: by leveraging the teacher’s expertise, SMERF eliminates the need for extensive manual tuning and parameter optimization, saving valuable time and effort.
reduced training time: the knowledge distillation approach significantly reduces the training time, enabling faster development and deployment of SMERF models.

Teacher supervision during training (source: paper)

SMERF achieves quality comparable to its teacher model while being smaller, much faster and more efficient.

Rendering

The rendering process follows these steps:

split the scene into smaller parts and use a different model for each part
select the most relevant submodel based on the camera origin
trace the light rays from the camera to the scene and back, compute the color and opacity of each ray, and combine them to create the final picture

Experiments

The authors evaluated their model against two state-of-the-art methods: 3D Gaussian Splatting and Zip-NeRF. SMERF achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m² at a 3.5 mm³ resolution. The model produces renderings that are almost as good as Zip-NeRF’s, but much faster and with less memory.

The next image showcases a comparison of SMERF and 3D Gaussian Splatting with ground truth images captured from the mip-NeRF 360 and Zip-NeRF datasets.

SMERF: qualitative comparison (source: paper)

As we can see, SMERF can effectively handle thin shapes, detailed textures, and intricate lighting effects, while 3D Gaussian Splatting struggles to maintain these visual qualities.

Conclusion

SMERF is a new method for rendering large, multi-room scenes on various devices. It uses a combination of 2 key elements: a partitioning scheme that divides the model into smaller, more manageable submodels and a new training strategy which involves learning from a more sophisticated, pre-trained model.

The model achieves remarkable performance, rendering complex scenes in real-time, even on resource-constrained devices like smartphones, laptops, and low-power desktops.

It can be used for a wide range of applications that require fast and high-quality rendering of complex scenes, such as virtual reality, gaming, and 3D modeling.