Efficient Semantic Scene Completion Network with Spatial Group Convolution

Zhang, Jiahui; Zhao, Hao; Yao, Anbang; Chen, Yurong; Zhang, Li; Liao, Hongen

doi:10.1007/978-3-030-01258-8_45

Cited by 102 publications

(81 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Spatial Group Convolution To improve the computing efficiency of the 3D network. EsscNet [42] is introduced, rather than to conduct the group convolution on feature channel dimension, which adopts the group convolution on the spatial aspect. The drawback of spatial group convolution is that it splits the features manually into separate parts, which cause the performance drops.…”

Section: Computation-efficient Networkmentioning

confidence: 99%

“…However, the performance of both scene completion and semantic scene completion is around 6% higher than that of SSCNet. Compared with the Essc-Net [42], depth solely is used as the input for a fair comparison, our method is computationally cheaper than EsscNet with 6% reduction in FLOPS and increased performance. For SC and SSC tasks, EsscNet reaches the accuracies of 56.2% (SC) and 26.7% (SSC), and we achieve 59.0% (SC) and 28.9% (SSC).…”

Section: Quantitative Analysismentioning

confidence: 99%

See 1 more Smart Citation

RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion

Liu

Gong

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

114

View full text Add to dashboard Cite

RGB images differentiate from depth as they carry more details about the color and texture information, which can be utilized as a vital complement to depth for boosting the performance of 3D semantic scene completion (SSC). SSC is composed of 3D shape completion (SC) and semantic scene labeling while most of the existing approaches use depth as the sole input which causes the performance bottleneck. Moreover, the state-of-the-art methods employ 3D CNNs which have cumbersome networks and tremendous parameters. We introduce a light-weight Dimensional Decomposition Residual network (DDR) for 3D dense prediction tasks. The novel factorized convolution layer is effective for reducing the network parameters, and the proposed multi-scale fusion mechanism for depth and color image can improve the completion and segmentation accuracy simultaneously. Our method demonstrates excellent performance on two public datasets. Compared with the latest method SSCNet, we achieve 5.9% gains in SC-IoU and 5.7% gains in SSC-IOU, albeit with only 21% network parameters and 16.6% FLOPs employed compared with that of SSCNet.

show abstract

Section: Computation-efficient Networkmentioning

confidence: 99%

Section: Quantitative Analysismentioning

confidence: 99%

RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion

Liu

Gong

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

114

View full text Add to dashboard Cite

show abstract

“…Recently several methods have been proposed for SSC using deep learning techniques [3], [12], [13], [8]. Among them, the most representative work is the SSCNet [3] which conducts the semantic labeling and scene completion simultaneously and also proves that these two tasks can benefit from each other.…”

Section: A Semantic Scene Completionmentioning

confidence: 99%

“…Although better results have been achieved compared with the previous methods, SSCNet ignores the finegrained information of depth. Zhang et al [13] introduces spatial group convolution (SGC) to reduce the computation costs but with poor performance than SSCNet [3]. SEGCloud [9] employs fine-grained 3D point as input but the computing and memory costs are incredibly high.…”

Section: A Semantic Scene Completionmentioning

confidence: 99%

Depth Based Semantic Scene Completion With Position Importance Aware Loss

Li¹,

Liu²,

Yuan³

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Semantic Scene Completion (SSC) refers to the task of inferring the 3D semantic segmentation of a scene while simultaneously completing the 3D shapes. We propose PALNet, a novel hybrid network for SSC based on single depth. PALNet utilizes a two-stream network to extract both 2D and 3D features from multi-stages using fine-grained depth information to efficiently captures the context, as well as the geometric cues of the scene. Current methods for SSC treat all parts of the scene equally causing unnecessary attention to the interior of objects. To address this problem, we propose Position Aware Loss(PA-Loss) which is position importance aware while training the network. Specifically, PA-Loss considers Local Geometric Anisotropy to determine the importance of different positions within the scene. It is beneficial for recovering key details like the boundaries of objects and the corners of the scene. Comprehensive experiments on two benchmark datasets demonstrate the effectiveness of the proposed method and its superior performance. Code and demo 1 are avaliable at: https://github.com/UniLauX/PALNet.

show abstract

“…On NYU Kinect, the proposed approaches perform worse than the baseline [5] for semantic scene completion, but better for scene completion. The only approach that fairly outperforms the baseline is [23]. All the other approaches use either an additional modality (RGB images) [8], pretrain on SUNCG [5], or do both [9].…”

Section: Evaluation On Nyu Depth V2mentioning

confidence: 99%

3D Semantic Scene Completion from a Single Depth Image Using Adversarial Training

Chen

Garbade

Gall

2019

2019 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

We address the task of 3D semantic scene completion, i.e., given a single depth image, we predict the semantic labels and occupancy of voxels in a 3D grid representing the scene. In light of the recently introduced generative adversarial networks (GAN), our goal is to explore the potential of this model and the efficiency of various important design choices. Our results show that using conditional GANs outperforms the vanilla GAN setup. We evaluate these architecture designs on several datasets. Based on our experiments, we demonstrate that GANs are able to outperform the performance of a baseline 3D CNN in case of clean annotations, but they suffer from poorly aligned annotations.

show abstract

Efficient Semantic Scene Completion Network with Spatial Group Convolution

Cited by 102 publications

References 44 publications

RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion

RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion

Depth Based Semantic Scene Completion With Position Importance Aware Loss

3D Semantic Scene Completion from a Single Depth Image Using Adversarial Training

Contact Info

Product

Resources

About