2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00618
|View full text |Cite
|
Sign up to set email alerts
|

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar

Abstract: We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with perfect per-pixel ground truth. An attributed spatial And-Or graph (S-AOG) is proposed to represent indoor scenes. The S-AOG is a probabilistic grammar model, in which the terminal nodes are object entities. Human contexts as contextual relations are encoded by Markov Random Fields (MRF) on the terminal nodes. We learn the distributions from an indoor scene dataset and s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
106
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 142 publications
(107 citation statements)
references
References 43 publications
1
106
0
Order By: Relevance
“…It takes on average less than 2 seconds for our model to generate a complete scene on a NVIDIA Geforce GTX 1080Ti GPU, which is two orders of magnitudes faster than the previous image based method (Deep Priors). While slower than endto-end methods such as [15], our model can also perform Method Accuracy Deep Priors [12] 84.69 Human-Centric [22] 76.18 Ours 58.75 Perturbed (1%) 50.00 Perturbed (5%) 54.69 Perturbed (10%) 64.38 Table 2. Real vs. synthetic classification accuracy for scenes generated by different methods.…”
Section: Methodsmentioning
confidence: 99%
“…It takes on average less than 2 seconds for our model to generate a complete scene on a NVIDIA Geforce GTX 1080Ti GPU, which is two orders of magnitudes faster than the previous image based method (Deep Priors). While slower than endto-end methods such as [15], our model can also perform Method Accuracy Deep Priors [12] 84.69 Human-Centric [22] 76.18 Ours 58.75 Perturbed (1%) 50.00 Perturbed (5%) 54.69 Perturbed (10%) 64.38 Table 2. Real vs. synthetic classification accuracy for scenes generated by different methods.…”
Section: Methodsmentioning
confidence: 99%
“…Wang et al [28] use a convolutional network that iteratively generates a 3D room scene by adding one object at a time. Qi et al [23] propose a spatial And-Or graph to represent indoor scenes, from which new scenes can be sampled. Different from most other works, they use human affordances and activity information with respect to objects in the scene to model probable spatial layouts.…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical representations have been used to learn grammars in natural language and images [50], 2D scenes [46], 3D shapes [26,60], and 3D scenes [32]. A related line of work parses RGB or RGB-D scenes hierarchically using And-Or graphs [20,21,34,40,64] for a variety of tasks. For full 3D scenes, there has been very limited amount of available training data with ground truth hierarchy annotations.…”
Section: Related Workmentioning
confidence: 99%