2021
DOI: 10.48550/arxiv.2108.08420
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

Xiang Xu,
Hanbyul Joo,
Greg Mori
et al.

Abstract: We address the problem of recovering dynamic 3D human-object interactions given a video input. Our focus is on reconstructing the articulating 3D object during manipulation. To enable research in this direction, we collect a dataset of interaction videos with common objects (e.g., laptop, fridge, dishwasher). We annotate the 3D object pose, shape, articulation state, and estimate the 3D mesh of the manipulator using the approach from Joo et al. [10]. Above are rendered frames from the ground truth annotations.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…Consequently, available datasets are also diverse and specialized (more details in Section 3.3.2). Only recently has object orientation has been introduced into HOI Detection [e.g., D3D-HOI (Xu et al, 2021 ) or BEHAVE (Bhatnagar et al, 2022 )]. So far, the focus has been mainly on human pose (e.g., Yao and Fei-Fei, 2010 ) or object size and positioning (e.g., Li et al, 2020 ).…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, available datasets are also diverse and specialized (more details in Section 3.3.2). Only recently has object orientation has been introduced into HOI Detection [e.g., D3D-HOI (Xu et al, 2021 ) or BEHAVE (Bhatnagar et al, 2022 )]. So far, the focus has been mainly on human pose (e.g., Yao and Fei-Fei, 2010 ) or object size and positioning (e.g., Li et al, 2020 ).…”
Section: Related Workmentioning
confidence: 99%
“…Recent approaches begin to tackle modeling and synthesizing human interactions within 3D scenes, or with objects. Most of the researches focus on statically posing humans within the given 3D environment [16,24,69,71], by generating human scene interaction poses from various types of input including object semantics [17], images [21,23,64,65,68], and text descriptions [49,72].…”
Section: Related Workmentioning
confidence: 99%
“…Some works [62,68] have studied human reconstruction from single images, while also recovering aspects of the environment. PHOSA [68] recovers humans interacting with objects from in-the-wild images, and is followed by [62,64] in other settings. While they focus on visible human-object interactions, we consider cases where the scene might not be fully visible.…”
Section: Humans In 3d Scenesmentioning
confidence: 99%