2021
DOI: 10.1109/access.2021.3090471
|View full text |Cite
|
Sign up to set email alerts
|

Joint Object Affordance Reasoning and Segmentation in RGB-D Videos

Abstract: The authors would like to thank the European Commission for financial support through H2020 project HR-Recycler under contract 820742 and Nvidia Corporation for a Titan X GPU donation ABSTRACT Understanding human-object interaction is a fundamental challenge in computer vision and robotics. Crucial to it is the ability to infer "object affordances" from visual data, namely the types of interaction supported by an object of interest and the object parts involved. Such inference can be approached as an "affordan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 74 publications
0
1
0
Order By: Relevance
“…Researchers have used other forms of supervision (strong supervision, weak supervision, imitation learning, reinforcement learning, inverse reinforcement learning) to build interactive understanding of objects. This can be in the form of learning a) where and how to grasp [9,21,26,27,30,35,36,39,43,51], b) state classifiers [25], c) interaction hotspots [15,42,44,61], d) spatial priors for action sites [46], e) object articulation modes [12,38], f) reward functions [29,31,50,52], g) functional correspondences [34]. While our work pursues similar goals, we differ in our supervision source (observation of human hands interacting with objects in egocentric videos).…”
Section: Related Workmentioning
confidence: 99%
“…Researchers have used other forms of supervision (strong supervision, weak supervision, imitation learning, reinforcement learning, inverse reinforcement learning) to build interactive understanding of objects. This can be in the form of learning a) where and how to grasp [9,21,26,27,30,35,36,39,43,51], b) state classifiers [25], c) interaction hotspots [15,42,44,61], d) spatial priors for action sites [46], e) object articulation modes [12,38], f) reward functions [29,31,50,52], g) functional correspondences [34]. While our work pursues similar goals, we differ in our supervision source (observation of human hands interacting with objects in egocentric videos).…”
Section: Related Workmentioning
confidence: 99%