2021
DOI: 10.48550/arxiv.2101.02692
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Where2Act: From Pixels to Actions for Articulated 3D Objects

Abstract: One of the fundamental goals of visual perception is to allow agents to meaningfully interact with their environment. In this paper, we take a step towards that long-term goalwe extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. For example, given a drawer, our network predicts that applying a pulling force on the handle opens the drawer. We propose, discuss, and evaluate novel network architectures that given ima… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 10 publications
(20 citation statements)
references
References 39 publications
0
16
0
Order By: Relevance
“…Learning Actionable Visual Representations aims for learning visual representations that are strongly aware of downstream robotic manipulation tasks and directly indicative of action probabilities for robotic executions, in contrast to predicting standardized visual semantics, such as category labels [48,49], segmentation masks [50,22], and object poses [51,52], which are usually defined independently from any specific robotic manipulation task. Grasping [53,54,55,56,57,58,59,60,61] or manipulation affordance [62,13,63,64,14,65,66,67,15] is one major kind of actionable visual representations, while many other types have been also explored recently (e.g., spatial maps [68,69], keypoints [70,71], contact points [72], etc). Following the recent work Where2Act [15], we employ dense affordance maps as the actionable visual representations to suggest action possibility at every point on 3D articulated objects.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Learning Actionable Visual Representations aims for learning visual representations that are strongly aware of downstream robotic manipulation tasks and directly indicative of action probabilities for robotic executions, in contrast to predicting standardized visual semantics, such as category labels [48,49], segmentation masks [50,22], and object poses [51,52], which are usually defined independently from any specific robotic manipulation task. Grasping [53,54,55,56,57,58,59,60,61] or manipulation affordance [62,13,63,64,14,65,66,67,15] is one major kind of actionable visual representations, while many other types have been also explored recently (e.g., spatial maps [68,69], keypoints [70,71], contact points [72], etc). Following the recent work Where2Act [15], we employ dense affordance maps as the actionable visual representations to suggest action possibility at every point on 3D articulated objects.…”
Section: Related Workmentioning
confidence: 99%
“…Grasping [53,54,55,56,57,58,59,60,61] or manipulation affordance [62,13,63,64,14,65,66,67,15] is one major kind of actionable visual representations, while many other types have been also explored recently (e.g., spatial maps [68,69], keypoints [70,71], contact points [72], etc). Following the recent work Where2Act [15], we employ dense affordance maps as the actionable visual representations to suggest action possibility at every point on 3D articulated objects. Extending beyond Where2Act which considers task-less shortterm manipulation, we further augment the per-point action predictions with task-aware distributions of trajectory proposals, providing more actionable information for downstream executions.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations