3D Gaussian Splatting vs. Instant-NGP: A Comparative Approach to 3D Scene Representation

3D Gaussian Splatting vs. Instant-NGP: A Comparative Approach to 3D Scene Representation
Personal Project

Juyang Bai

Introduction

The landscape of 3D image reconstruction and rendering has witnessed remarkable progress in recent times. Novel approaches such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting have emerged as game-changers in the field. These innovative techniques have revolutionized our capacity to recreate complex visual environments, particularly those featuring challenging elements like reflective or transparent materials.
NeRF technology marks a significant leap forward in the realm of high-quality 3D scene recreation. This method utilizes advanced neural networks to capture a comprehensive volumetric scene representation. It achieves this by mapping complex 5D coordinates, which encompass both spatial positioning and viewing angles, to corresponding color and density values. This approach enables the creation of remarkably detailed scene renderings. However, the implementation of NeRF models comes with its own set of challenges, primarily related to the substantial computational power and time they require.
In response to these hurdles, researchers have developed more efficient variants. One such notable advancement is Instant Neural Graphics Primitives (InstantNGP). This iteration aims to strike a balance between processing speed and output quality. InstantNGP achieves its improved performance through cutting-edge data organization techniques and optimized neural network architectures. Our previous project can prove that Instant-NGP can achieve better rendering results than other NeRF variants, refering to Evaluating Rendering Techniques: A Comparative Study of Instant-NGP, LERF, Nerfacto, and TensorRF.
Concurrently, 3D Gaussian Splatting has gained traction as an alternative method. This technique stands out for its unique approach of employing Gaussian kernels to map 3D points onto 2D surfaces. Reports suggest that this method can achieve rendering speeds up to 50 times faster than conventional NeRF implementations. Its effectiveness in handling limited input data and reconstructing scenes from sparse information makes it a compelling subject for comparison with accelerated NeRF variants like InstantNGP.
This study seeks to conduct a thorough comparative analysis between 3D Gaussian Splatting and InstantNGP, providing valuable information for researchers and practitioners alike.

Methods

Instant Neural Graphics Primitives (Instant NGP)

Instant NGP is a method used to improve the apparoximation quality and training speed of a given fully connect nerual network m(y; Φ). The focus of this method is to optimize an encoding of the input to the given fully connect nerual network y = enc(x; θ). The neural network integrates trainable weight parameters Φ and encoding parameters θ. These parameters are structured across L levels, with each level containing up to T feature vectors of dimensionality F.

overview — Figure 1: Overview of the Instant NGP method.

The multiresolution hash encoding process, illustrated in Figure 1, operates on independent levels (exemplified by red and blue in the diagram). Each level stores feature vectors at grid vertices, with grid resolutions following a geometric progression from the coarsest N_min to the finest N_max resolution: $$ N_l := \left\lfloor N_{\text{min}} \cdot b^l \right\rfloor, $$ $$ b := \exp\left(\frac{\ln N_{\text{max}} - \ln N_{\text{min}}}{L - 1}\right). $$ N_max is chosen to match the finest detail in the training data. Due to the large number of levels L, the growth factor is usually small.
Consider a single level l. The input coordinate x ∈ R^d is scaled by that level's grid resolution before rounding down and up ⌊x · N_l⌋ and ⌈x · N_l⌉. The voxel spanning [⌊ x ⌋] and [⌈ x ⌉] contains 2^d integer vertices in ℤ^d. Each corner corresponds to an entry in the level's feature vector array, which has a maximum size of T. For coarse levels where (N_l + 1)^d ≤ T, this mapping is one-to-one. At finer levels, we employ a hash function h : ℤ^d → ℤ_T to index the array, treating it as a hash table without explicit collision handling. We depend on gradient-based optimization to store appropriate sparse detail, with the subsequent neural network m(y; Φ) resolving collisions. The number of trainable encoding parameters θ is thus O(T) and capped at T · L · F, which in our case is consistently T · 16 · 2. We utilize a spatial hash function defined as: $$ h(x) = \left(\bigoplus_{i=1}^d \pi_i x_i \right) \mod T $$ where ⊕ represents the bit-wise XOR operation and π_i are distinct, large prime numbers. This formula XORs the results of a per-dimension linear congruential permutation [Lehmer 1951], decorrelating the dimensions' impact on the hashed value. To achieve (pseudo-)independence, only d - 1 dimensions require permutation. We set π_1 = 1, and for improved cache coherence, π_2 = 2,654,435,761 and π_3 = 8,050,459,861. The feature vectors at each corner undergo d-linear interpolation based on x's relative position within its hypercube. The interpolation weight is defined as w_l := x_l - ⌊x_l⌋. This process occurs independently for all L levels. The interpolated feature vectors from each level, combined with auxiliary inputs ξ ∈ ℝ^F (e.g., encoded view direction and textures in neural radiance caching), are concatenated to form y ∈ ℝ^{L · F + E}. This encoded input enc(x; θ) then feeds into the MLP m(y; Φ).

3D Gaussian Splatting

The method introduces a novel approach for real-time radiance field rendering, using 3D Gaussians as the core representation. It achieves high-quality results comparable to state-of-the-art methods while offering significantly faster training and real-time rendering capabilities.
Scene Representation: The scene is represented by a set of 3D Gaussians, each defined by a position, covariance matrix, opacity, and spherical harmonic coefficients for color. This representation allows for flexible optimization and efficient rendering.
Optimization Process: Starting from sparse Structure-from-Motion points, the method optimizes the Gaussian parameters through an iterative process. It includes steps for adaptive density control, where Gaussians can be added, removed, or split based on the current reconstruction quality.
Fast Rendering: A key innovation is the tile-based rasterizer that enables fast, differentiable rendering of the 3D Gaussians. It uses GPU-accelerated sorting and a custom blending process to achieve real-time frame rates while maintaining high quality.

This method offers several significant advantages over previous approaches:

Real-time rendering (> 50 fps) on consumer GPUs
High-quality results comparable to or better than neural rendering methods
Compact scene representation
Ability to edit scenes interactively

Experiment Settings

Data Prepration

To evaluate the performance of Instant-NGP and 3D Gaussian Splatting, we caputered 360° video of a bench outside the Malone Hall. We then split the video frame by frame. To train Instant-NGP and 3D Gaussian Splatting, we need to convert the data into required format.
For Gaussian Splatting, we had to create calibrated cameras with Structure-from-Motion (SfM) and initialize the set of 3D Gaussians with a sparse point cloud produced from the SfM process. This process was carried out using COLMAP and the Gaussian Splatting source code. Specifically, we had to use the provided function named convert.py to adapt our custom datasets to the correct format. This data was then used to train the 3D Gaussian Splatting Model.
For Instant NGP, we once again employed COLMAP to extract camera poses and intrinsic parameters from our dataset images. We then used Instant NGP's colmap2nerf.py script to convert this information into the required format. Finally, we used these processed datasets to train the Instant NGP model.

Training Settings

We trained both the 3D Gaussian Splatting and Instant NGP models from scratch on one NVIDIA A6000 GPU. The implementation for both models was provided by the respective researchers on GitHub. While training the models, we keep the default hyperparameters.

Results

Through both qualitative and quantitative analysis, we found that 3D Gaussian Splatting outperforms Instant NGP in terms of rendering quality. Additionally, our observations during the training and rendering processes indicate that 3D Gaussian Splatting is significantly faster than Instant NGP.

References

[1] Müller, Thomas, et al. "Instant neural graphics primitives with a multiresolution hash encoding." ACM transactions on graphics (TOG) 41.4 (2022): 1-15.
[2] Kerbl, Bernhard, et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." ACM Trans. Graph. 42.4 (2023): 139-1.