2019 IEEE International Conference on Image Processing (ICIP) 2019
DOI: 10.1109/icip.2019.8803174
|View full text |Cite
|
Sign up to set email alerts
|

3D Semantic Scene Completion from a Single Depth Image Using Adversarial Training

Abstract: We address the task of 3D semantic scene completion, i.e., given a single depth image, we predict the semantic labels and occupancy of voxels in a 3D grid representing the scene. In light of the recently introduced generative adversarial networks (GAN), our goal is to explore the potential of this model and the efficiency of various important design choices. Our results show that using conditional GANs outperforms the vanilla GAN setup. We evaluate these architecture designs on several datasets. Based on our e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 18 publications
0
20
0
Order By: Relevance
“…Voxel grid encodes scene geometry as 3D grid, which cells describe semantic occupancy of the space. Opposed to point clouds, grids conveniently define neighborhood with adjacent cells, and thus enable easy application of 3D CNNs, which facilitates to extend deep learning architectures designed for 2D data into 3D [14,17,19,22,24,28,29,39,49,68,108,118,155,158]. However, the representation suffers from constraining limitations and efficiency drawbacks since it represents both occupied and free regions of the scene, leading to high memory and computation needs.…”
Section: Scene Representationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Voxel grid encodes scene geometry as 3D grid, which cells describe semantic occupancy of the space. Opposed to point clouds, grids conveniently define neighborhood with adjacent cells, and thus enable easy application of 3D CNNs, which facilitates to extend deep learning architectures designed for 2D data into 3D [14,17,19,22,24,28,29,39,49,68,108,118,155,158]. However, the representation suffers from constraining limitations and efficiency drawbacks since it represents both occupied and free regions of the scene, leading to high memory and computation needs.…”
Section: Scene Representationsmentioning
confidence: 99%
“…For scene completion, the value of the gradient field is estimated at specific locations, typically at the voxel centers, for voxel grids [22,24], or at the point locations for point clouds [105]. Implicit surface may also be used as input [14,17,22,24,28,29,68,118,141,155,158] to reduce the sparsity of the input data, at the expense of greedy computation. For numerical reason, most works encode in fact a flipped version (cf.…”
Section: Scene Representationsmentioning
confidence: 99%
“…Existing works all use geometrical inputs like depth [12,25,[39][40][41][42]45], occupancy grids [13,25,55,69] or point cloud [53,81]. Truncated Signed Distance Function (TSDF) were also proved informative [6,9,10,12,20,21,41,59,64,77,79]. Among others originalities, some SSC works use adversarial training to guide realism [10,64], exploit multi-task [6,38], or use lightweight networks [40,55].…”
Section: Related Workmentioning
confidence: 99%
“…Truncated Signed Distance Function (TSDF) were also proved informative [6,9,10,12,20,21,41,59,64,77,79]. Among others originalities, some SSC works use adversarial training to guide realism [10,64], exploit multi-task [6,38], or use lightweight networks [40,55]. Of interest for us, while others have used RGB as input [6,8,9,14,20,20,25,29,39,40,42,45,81] it is always along other geometrical input (e.g.…”
Section: Related Workmentioning
confidence: 99%
“…While this boosts performance, it also increases the network complexity and subsequently the inference time. Generative Adversarial Networks (GANs) have also been proposed to enforce realistic outputs [39,7] but are harder to train. To lower memory consumption with the preferred voxelized representations, Spatial Group Convolutions (SGC) [40] divide input into groups for efficient processing at the cost of small performance drops.…”
Section: Related Workmentioning
confidence: 99%