2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00013
|View full text |Cite
|
Sign up to set email alerts
|

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image

Abstract: Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene conte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
228
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 197 publications
(228 citation statements)
references
References 40 publications
(97 reference statements)
0
228
0
Order By: Relevance
“…Rogers and Christensen (2012) and Lin et al (2013) leveraged objects to perform a joint object-and-place classification. Nie et al (2020), Huang et al (2018a), and Zhao and Zhu (2013b) jointly solved the problem of scene understanding and reconstruction. Pangercic et al (2012) reasoned on the objects’ functionality.…”
Section: Related Workmentioning
confidence: 99%
“…Rogers and Christensen (2012) and Lin et al (2013) leveraged objects to perform a joint object-and-place classification. Nie et al (2020), Huang et al (2018a), and Zhao and Zhu (2013b) jointly solved the problem of scene understanding and reconstruction. Pangercic et al (2012) reasoned on the objects’ functionality.…”
Section: Related Workmentioning
confidence: 99%
“…Discriminative methods can exploit large training datasets to learn to classify scene components from input data such as RGB and RGB-D images [4,18,35,51,56]. By introducing clever Deep Learning architectures applied to point clouds or voxel-based representations, these methods can achieve very good results.…”
Section: Complete Scene Reconstructionmentioning
confidence: 99%
“…3D scene understanding is a fundamental problem in Computer Vision [41,53]. In the case of indoor scenes, one usually aims at recognizing the objects and their properties such as their 3D pose and geometry [2,3,15], or the room layouts [57,31,62,59,30,36,50,60,62,54,55], or both [4,18,35,45,51,56]. With the development of deep learning approaches, the field has made a remarkable progress.…”
Section: Introductionmentioning
confidence: 99%
“…Since 2015, many deep-learning-based 3D reconstruction methods are presented, among which the point-based technique is simple but efficient in terms of memory requirements [36]. Similar to volumetric [37,38] and surface-based representations [39,40], point-based techniques follow the encoder-decoder model. In general, grid representations use up-convolutional networks to decode the latent variable [41,42].…”
Section: Deep-learning-based Reconstructionmentioning
confidence: 99%