DE-NeRF: DEcoupled Neural Radiance Fields for View-Consistent Appearance Editing and High-Frequency Environmental Relighting

Accepted by Proc. of SIGGRAPH 2023

Figure: Given a set of input images, we train a neural radiance field that decouples geometry, appearance, and lighting. Our method supports not only the geometry manipulation and appearance editing but also the rendering of the captured or modified scene in a novel lighting condition

Neural Radiance Fields (NeRF) have shown promising results in novel view synthesis. While achieving state-of-the-art rendering results, NeRF usually encodes all properties related to geometry and appearance of the scene together into several MLP (Multi-Layer Perceptron) networks, which hinders downstream manipulation of geometry, appearance and illumination. Recently researchers made attempts to edit geometry, appearance and lighting for NeRF. However, they fail to render view-consistent results after editing the appearance of the input scene. Moreover, high-frequency environmental relighting is also beyond their capability as lighting is modeled as Spherical Gaussian (SG) and Spherical Harmonic (SH) functions or a low-resolution environment map. To solve the above problems, we propose DE-NeRF to decouple view-independent appearance and view-dependent appearance in the scene with a hybrid lighting representation. Specifically, we first train a signed distance function to reconstruct an explicit mesh for the input scene. Then a decoupled NeRF learns to attach view-independent appearance to the reconstructed mesh by defining learnable disentangled features representing geometry and view-independent appearance on its vertices. For lighting, we approximate it with an explicit learnable environment map and an implicit lighting network to support both low-frequency and high-frequency relighting. By modifying the view-independent appearance, rendered results are consistent across different viewpoints. Our method also supports high-frequency environmental relighting by replacing

DE-NeRF: DEcoupled Neural Radiance Fields for View-Consistent Appearance Editing and High-Frequency Environmental Relighting

[ArXiv Preprint]

Overview of DE-NeRF

Figure: Given a set of images, we learn a signed distance function to reconstruct the geometry. Then, on the vertices of the reconstructed mesh, we set up learnable geometry features l_g and appearance features l_a, l_r, l_p (corresponding to diffuse, roughness and specular components) to decompose geometry, appearance, and lighting in the scene. A sample point’s geometry feature l^w_g and appearance features l^w_a, l^w_r, l^w_p are obtained by KNN (K-nearest neighbor) interpolation. The geometry feature l^w_g and the distance to the mesh h are fed into an SDF decoder to predict its signed distance value s. Similarly, appearance features l^w_a, l^w_r, l^w_p, and distance h go hrough several appearance decoders to predict diffuse albedo a, roughness value r, and specular tint p. A learnable environment map E_d is integrated with the diffuse albedo to get diffuse color c_d. We also train a specular lighting decoder F_s to predict specular lighting c_l, which is multiplied by the specular tint t to produce the specular color c_s. Combining c_d and c_s , we get the color c for this point.

Figure: Given a sample point in the scene (the red point), we sample multiple directions w_o from the sample point to points (black points on the blue frame) on the sky sphere. We treat these directions as view directions and feed them along with the roughness value of the sample point into the specular lighting decoder to get the specular lighting colors from different view directions. These predicted specular lighting colors are unwrapped to the 2D image space as an environment map.

Figure: Qualitative comparison of geometry reconstruction. Our method can recover better surface details compared to NeuS [Wang et al. 2021], PhySG [Zhang et al. 2021a], and NvDiffRec [Munkberg et al. 2022].

Figure: Novel view synthesis comparisons with PhySG [Zhang et al. 2021a], NeRFactor [Zhang et al. 2021b], NvDiffRec [Munkberg et al. 2022], and NeuMesh [Bao et al. 2022].

Figure: Scene appearance editing comparison with NeuMesh [Bao et al. 2022]. NeuMesh [Bao et al. 2022] can generate plausible rendering results from the editing viewpoint but rendered results from another viewpoint may be inconsistent with the input editing. Our method produces more faithful editing results from both editing viewpoint and novel viewpoints.

Figure: Scene relighting comparisons with PhySG [Zhang et al. 2021a], InvRender [Zhang et al. 2022b], NeRFactor [Zhang et al. 2021b], NvDiffRec [Munkberg et al. 2022], and NvDiffRecMC [Hasselgren et al. 2022]. In each row, the input scene and target environment map are shown in the first column. In other columns, we show relighting results by different methods and the ground truth relighting result. With the help of our reconstructed geometry and hybrid lighting representation, our method can produce more faithful relighting results with high-frequency details.