Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

Huang, Siyuan; Qi, Siyuan; Zhu, Yixin; Xiao, Yinxue; Xu, Yuanlu; Zhu, Song‐Chun

doi:10.1007/978-3-030-01234-2_12

Cited by 129 publications

(114 citation statements)

References 50 publications

Supporting

Mentioning

107

Contrasting

Order By: Relevance

“…We obtain the mAP score at 11.49. From Huang et al [7]'s work, they achieved 12.07 on 15 main furniture and 8.06 on all their 30 categories. It shows that our approach achieves better performance in 'smaller' objects, which is in line with the qualitative analysis.…”

Section: Quantitative Evaluationmentioning

confidence: 96%

“…Moreover, the support relation is also a type of spatial relationship in scene grammar to enhance contextual bindings between objects. Other approaches implemented support inference to understand scenes with a scene graph [15,7]. However, these methods either require for depth information, or rely on hand-crafted priors or models.…”

Section: Related Workmentioning

confidence: 99%

“…Longer line segments (like layout lines) would contribute more and guide the orthogonal vanishing lines in alignment with room orientation (see Part I in Figure 6). However, an improper camera initialization, particularly in cluttered environments, would often cause faulty estimation of 3D room layout [7]. We include iterations to improve the camera parameters from the detected line segments and produce a refined room layout simultaneously.…”

Section: Scene Initializationmentioning

confidence: 99%

“…3D Object Placement The accuracy of 3D object placement is tested using manually annotated 3D bounding boxes along with the evaluation benchmark provided by [48], where the mean average precision (mAP) of the 3D IoU between the predicted bounding boxes and the ground-truth is calculated. We align the reconstructed and ground-truth scenes to the same size by unifying the camera altitude, and compare our result with the state-of-the-art [7]. Different from their work, our method is designed for modeling full scenes with considering all indoor objects, while they adopted a sparsely annotated dataset SUN-RGBD for evaluation with their 30 object categories.…”

Section: Quantitative Evaluationmentioning

confidence: 99%

“…scene recognition [5]), those techniques are only able to represent a fragment knowledge of full scene context.With the lack of depth clues, prior studies reconstructed indoor scenes from a single image by exploiting shallow image features (e.g. line segments and HOG descriptors [6,4]) or introducing depth estimation [7,8] to search object models. Other works adopt Render-and-Match strategy to obtain CAD scenes with their renderings similar as input images [9].…”

mentioning

confidence: 99%

See 4 more Smart Citations

Shallow2Deep: Indoor scene modeling by single image understanding

Nie

Guo

Chang

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

Dense indoor scene modeling from 2D images has been bottlenecked due to the absence of depth information and cluttered occlusions. We present an automatic indoor scene modeling approach using deep features from neural networks. Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship by reasoning indoor environment context. Particularly, we design a shallow-to-deep architecture on the basis of convolutional networks for semantic scene understanding and modeling. It involves multi-level convolutional networks to parse indoor semantics/geometry into non-relational and relational knowledge. Non-relational knowledge extracted from shallow-end networks (e.g. room layout, object geometry) is fed forward into deeper levels to parse relational semantics (e.g. support relationship). A Relation Network is proposed to infer the support relationship between objects. All the structured semantics and geometry above are assembled to guide a global optimization for 3D scene modeling. Qualitative and quantitative analysis demonstrates the feasibility of our method in understanding and modeling semantics-enriched indoor scenes by evaluating the performance of reconstruction accuracy, computation performance and scene complexity. Shallow2Deep: Indoor Scene Modeling by Single Image Understanding vision tasks [1] and most of them are still under active development, e.g. object segmentation [2], layout estimation [3] and geometric reasoning [4]. Although machine intelligence has reached comparable human-level performance in some tasks (e.g. scene recognition [5]), those techniques are only able to represent a fragment knowledge of full scene context.With the lack of depth clues, prior studies reconstructed indoor scenes from a single image by exploiting shallow image features (e.g. line segments and HOG descriptors [6,4]) or introducing depth estimation [7,8] to search object models. Other works adopt Render-and-Match strategy to obtain CAD scenes with their renderings similar as input images [9]. However, it is still an unresolved problem when indoor geometry is over-cluttered and complicated. The reasons are threefold. First, complicated indoor scenes involve heavily occluded objects, which could cause missing contents in detection [9]. Second, cluttered environments significantly increase the difficulty of camera and layout estimations, which critically affects the reconstruction quality [10]. Third, compared to the large diversity of objects in real scenes, the reconstructed virtual environment is still far from satisfactory (missing small pieces, wrong labeling). Existing methods have explored the use of various contextual knowledge, including object support relationship [7,8] and human activity [7], to improve modeling quality. However, their relational (or contextual) features are hand-crafted and would fail to cover a wide range of objects in cluttered scenes.

show abstract

Section: Quantitative Evaluationmentioning

confidence: 96%

Section: Related Workmentioning

confidence: 99%

Section: Scene Initializationmentioning

confidence: 99%

Section: Quantitative Evaluationmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations