We propose a novel method for instance label segmentation of dense 3D voxel grids. We target volumetric scene representations which have been acquired with depth sensors or multi-view stereo methods and which have been processed with semantic 3D reconstruction or scene completion methods. The main task is to learn shape information about individual object instances in order to accurately separate them, including connected and incompletely scanned objects. We solve the 3D instance-labeling problem with a multi-task learning strategy. The first goal is to learn an abstract feature embedding which groups voxels with the same instance label close to each other while separating clusters with different instance labels from each other. The second goal is to learn instance information by estimating directional information of the instances' centers of mass densely for each voxel. This is particularly useful to find instance boundaries in the clustering post-processing step, as well as for scoring the quality of segmentations for the first goal. Both synthetic and real-world experiments demonstrate the viability of our approach. Our method achieves state-ofthe-art performance on the ScanNet 3D instance segmentation benchmark [4].
DROID-SLAM COLMAP NICER-SLAM Ground Truth RGB input RGB-D input NICE-SLAM Figure 1: 3D Dense Reconstruction and Rendering from Different SLAM Systems. On the Replica dataset [49], we compare to dense RGB-D SLAM method NICE-SLAM [76], and monocular SLAM approaches COLMAP [46], DROID-SLAM [57], and our proposed NICER-SLAM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.