Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

Zhang, Jiazhao; Zhu, Chenyang; Zheng, Lintao; Xu, Kai

doi:10.1109/cvpr42600.2020.00459

Cited by 68 publications

(36 citation statements)

References 81 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are mainly three categories for 3D semantic segmentation methods: projection-based methods, voxel-based methods and point-based methods. Multi-view projection based methods [15,34,8] project the 3D data into 2D from multiple viewpoints, therefore they can easily process the projected data on 2D convolution networks. However, these methods suffer from occlusion, view-point selection, misalignment and other defects that may limit the performance.…”

Section: Semantic Segmentation On 3d Point Cloudsmentioning

confidence: 99%

Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

Wei

Lin

Yap

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

124

100

View full text Add to dashboard Cite

Semantic segmentation on 3D point clouds is an important task for 3D scene understanding. While dense labeling on 3D data is expensive and time-consuming, only a few works address weakly supervised semantic point cloud segmentation methods to relieve the labeling cost by learning from simpler and cheaper labels. Meanwhile, there are still huge performance gaps between existing weakly supervised methods and state-of-the-art fully supervised methods. In this paper, we train a semantic point cloud segmentation network with only a small portion of points being labeled. We argue that we can better utilize the limited supervision information as we densely propagate the supervision signal from the labeled points to other points within and across the input samples. Specifically, we propose a cross-sample feature reallocating module to transfer similar features and therefore re-route the gradients across two samples with common classes and an intra-sample feature redistribution module to propagate supervision signals on unlabeled points across and within point cloud samples. We conduct extensive experiments on public datasets S3DIS and ScanNet. Our weakly supervised method with only 10% and 1% of labels can produce compatible results with the fully supervised counterpart.

show abstract

Section: Semantic Segmentation On 3d Point Cloudsmentioning

confidence: 99%

Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

Wei

Lin

Yap

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

124

100

View full text Add to dashboard Cite

show abstract

“…Semantic segmentation results of 20-class objects/scenarios from different approaches are listed in Table II, FPC [42] achieves good predictions in 5 classes, especially in bath,bed and wall instances. However, it does not understand bookshelves existed in scenes.…”

Section: D Semantic Segmentation Resultsmentioning

confidence: 99%

Semantic Dense Reconstruction with Consistent Scene Segments

Wan,

Li,

You

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, a method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks. First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone that propagates objects' labels with high probabilities from full scans to corresponding ones of partial views. Then a dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence. Benefiting from 2D consistent semantic segments and the 3D model, a novel semantic projection block (SP-Block) is proposed to extract deep feature volumes from 2D segments of different views. Moreover, the semantic volumes are fused into deep volumes from a point cloud encoder to make the final semantic segmentation. Extensive experimental evaluations on public datasets show that our system achieves accurate 3D dense reconstruction and state-of-the-art semantic prediction performances simultaneously.

show abstract

“…PointNet [11] is one of the first works of directly learning the point features based on the raw point clouds through a shared Multi-Layer Perceptron (MLP) and max-pooling. Some subsequent works [12], [13], [14], [15], [16], [17], [18], [19], [20] are often based on the pioneering works (e.g., PointNet, PointNet++) and further promote the effectiveness of sampling, grouping and ordering to improve the performance of semantic segmentation. Other methods [21], [22], [23] extract the hierarchical point features by introducing a graph network.…”

Section: A Deep Learning For 3d Point Cloudsmentioning

confidence: 99%

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Zhao,

Zhou,

Zhu

et al. 2021

Preprint

View full text Add to dashboard Cite

Camera and 3D LiDAR sensors have become indispensable devices in modern autonomous driving vehicles, where the camera provides the fine-grained texture, color information in 2D space and LiDAR captures more precise and farther-away distance measurements of the surrounding environments. The complementary information from these two sensors makes the two-modality fusion be a desired option. However, two major issues of the fusion between camera and LiDAR hinder its performance, i.e., how to effectively fuse these two modalities and how to precisely align them (suffering from the weak spatiotemporal synchronization problem). In this paper, we propose a coarse-to-fine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation. For the first issue, unlike these previous works fusing the point cloud and image information in a one-to-one manner, the proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy. Second, due to the weak spatiotemporal synchronization problem, an offset rectification approach is designed to align these two-modality features. The cooperation of these two components leads to the success of the effective camera-LiDAR fusion. Experimental results on the nuScenes dataset show the superiority of the proposed LIF-Seg over existing methods with a large margin. Ablation studies and analyses demonstrate that our proposed LIF-Seg can effectively tackle the weak spatiotemporal synchronization problem.

show abstract

Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

Cited by 68 publications

References 81 publications

Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds

Semantic Dense Reconstruction with Consistent Scene Segments

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Contact Info

Product

Resources

About