Semantic scene completion is the task of jointly estimating 3D geometry and semantics of objects and surfaces within a given extent. This is a particularly challenging task on real-world data that is sparse and occluded. We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion. Unlike previous work on scene completion, our method produces a continuous scene representation that is not based on voxelization. We encode raw point clouds into a latent space locally and at multiple spatial resolutions. A global scene completion function is subsequently assembled from the localized function patches. We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization (thus avoiding the trade-off between level of scene detail and the scene extent that can be covered). We train and evaluate our method on semantically annotated LiDAR scans from the Semantic KITTI dataset. Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene. The performance of our method surpasses the state of the art on the Semantic KITTI Scene Completion Benchmark in terms of both geometric and semantic completion Intersection-over-Union (IoU).
A considerable amount of annotated training data is necessary to achieve state-of-the-art performance in perception tasks using point clouds. Unlike RGB-images, LiDAR point clouds captured with different sensors or varied mounting positions exhibit a significant shift in their input data distribution. This can impede transfer of trained feature extractors between datasets as it degrades performance vastly.We analyze the transferability of point cloud features between two different LiDAR sensor set-ups (32 and 64 vertical scanning planes with different geometry). We propose a supervised training methodology to learn transferable features in a pre-training step on LiDAR datasets that are heterogeneous in their data and label domains. In extensive experiments on object detection and semantic segmentation in a multi-task setup we analyze the performance of our network architecture under the impact of a change in the input data domain. We show that our pre-training approach effectively increases performance for both target tasks at once without having an actual multi-task dataset available for pre-training.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.