2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00329
|View full text |Cite
|
Sign up to set email alerts
|

Human Hands as Probes for Interactive Object Understanding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(6 citation statements)
references
References 48 publications
0
6
0
Order By: Relevance
“…Some other approaches ground these action labels to images by predicting heatmaps that indicate interaction possibilities [14,35,48,57,60]. While heatmaps only specify where to interact without telling what to do, recent approaches predict richer properties such as contact distance [39], action trajectory [48,55], grasping categories [23,50], etc. Instead of predicting more sophisticated interaction states, we explore directly synthesizing HOI images for possible interactions because images demonstrate both where and how to interact comprehensively and in a straightforward manner.…”
Section: Related Workmentioning
confidence: 99%
“…Some other approaches ground these action labels to images by predicting heatmaps that indicate interaction possibilities [14,35,48,57,60]. While heatmaps only specify where to interact without telling what to do, recent approaches predict richer properties such as contact distance [39], action trajectory [48,55], grasping categories [23,50], etc. Instead of predicting more sophisticated interaction states, we explore directly synthesizing HOI images for possible interactions because images demonstrate both where and how to interact comprehensively and in a straightforward manner.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, the computer vision community is increasingly interested in understanding 3D dynamics of objects. Researchers try to understand the 3D shapes, axes, movable parts and affordance on synthetic data [42,64,43,25,62,32,60], videos [47,21,20,44] or point clouds [26]. Our work is mostly related to [47,21,20] since they work on real images, but is different from them on two aspectives: First, they need video or multi-view inputs, but our input is only a single image.…”
Section: Related Workmentioning
confidence: 99%
“…Researchers try to understand the 3D shapes, axes, movable parts and affordance on synthetic data [42,64,43,25,62,32,60], videos [47,21,20,44] or point clouds [26]. Our work is mostly related to [47,21,20] since they work on real images, but is different from them on two aspectives: First, they need video or multi-view inputs, but our input is only a single image. Second, their approaches recover the objects which are being interacted, while our approach understands potential interactions before any interactions happen.…”
Section: Related Workmentioning
confidence: 99%
“…Articulated object pose estimation is a crucial and fundamental computer vision problem with a wide range of applications in robotics, human-object interaction, and augmented reality Katz & Brock (2008); Mu et al (2021); Labbé et al (2021); Jiang et al (2022); Goyal et al (2022); Li et al (2020b). Different from 6D pose estimation for rigid objects Tremblay et al (2018); Xiang et al (2017); Sundermeyer et al (2018); Wang et al (2019a), articulated object pose estimation requires a hierarchical pose understanding on both the object-level and part-level Li et al (2020a).…”
Section: Introductionmentioning
confidence: 99%