Figure: OctField utilizes an octree structure to achieve a hierarchical implicit representation, where part geometry enclosed by an octant is represented by a local implicit function. OctField achieves an adaptive allocation of modeling capacity according to the richness of surface geometry. In particular, intricate parts such as jet engines, tail-planes and the undercarriage are automatically subdivided to engage more implicit kernels for higher modeling accuracy, while parts with regular shapes on the fuselage is encoded using a coarser-level representation that suffices.
Recent advances in localized implicit functions have enabled neural implicit representation to be scalable to large scenes. However, the regular subdivision of 3D space employed by these approaches fails to take into account the sparsity of the surface occupancy and the varying granularities of geometric details. As a result, its memory footprint grows cubically with the input volume, leading to a prohibitive computational cost even at a moderately dense decomposition. In this work, we present a learnable hierarchical implicit representation for 3D surfaces, coded OctField, that allows high-precision encoding of intricate surfaces with low memory and computational budget. The key to our approach is an adaptive decomposition of 3D scenes that only distributes local implicit functions around the surface of interest. We achieve this goal by introducing a hierarchical octree structure to adaptively subdivide the 3D space according to the surface occupancy and the richness of part geometry. As octree is discrete and non-differentiable, we further propose a novel hierarchical network that models the subdivision of octree cells as a probabilistic process and recursively encodes and decodes both octree structure and surface geometry in a differentiable manner. We demonstrate the value of OctField for a range of shape modeling and reconstruction tasks, showing superiority over alternative approaches.
OctField: Hierarchical Implicit Functions for 3D Modeling
(Accepted by NeurIPS 2021)
Figure: We propose a novel recursive encoder-decoder structure and train the network in a VAE manner. We use the voxel 3D CNN to encode the octants’ geometry, and recursively aggregate the structure and geometry features using a hierarchy of local encoder Ei. The decoding is implemented using a hierarchy of local decoders Di with a mirrored structure with respect to the encoder. Both the structure and geometry information are recursively decoded and the local surfaces are recovered using the implicit octant decoder within each octant.
Figure: The architecture of hierarchical encoder Ek and decoder Dk. Ek gathers the structure (αcj, βcj) and geometry gcj feature of child octants to its parent octant k by a MLP, max-pooling operation, and another MLP, where cj ∈ Ck. Dk decodes the parent octant feature gk to features gcj and two indicators αcj, βcj of its child octants by two MLPs and classifiers. Two indicators infer the probability of surface occupancy and the necessity of further subdivision, respectively.
Figure: Shape reconstruction comparison with the baseline methods ((a) Input, (b) AOCNN, (c) LIG, (d) OccNet, (e) ConvONet, (f) IM-Net, and (g) Ours).
Figure: We show the results generated by randomly sampling the latent codes in the latent space.
Figure: The figure shows two interpolated results in two categories: table and chair. (a) is source shape, (f) is target shape.
Figure: Scene Reconstruction comparison with Local Implicit Grid and Convolutional Occupancy Network. We show that our method can provide more accurate reconstruction of geometric and structural details of large scenes.
Last updated on Oct, 2021.