2022
DOI: 10.48550/arxiv.2202.00732
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

IFOR: Iterative Flow Minimization for Robotic Object Rearrangement

Abstract: Figure 1. An example of IFOR being applied to real data. The initial and goal scenes are shown on the left.Our approach allows the robot to repeatedly identify transformations that will minimize the flow for various objects between the current and goal scenes. It can then repeatedly grasp, move, and place objects, rotating as necessary, in order to achieve the configuration in the goal scene. The system is trained completely on synthetic data and transfers to the real world in zero-shot manner.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 50 publications
0
6
0
Order By: Relevance
“…The use of depth input has also been extensively studied. Methods like CLIPort [3] and IFOR [1] directly process the RGB-D images for object manipulation, and hence are limited to simple pickand-place tasks in 2D top-down settings. To overcome this issue, explicit 3D representations such as point clouds have been utilized.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…The use of depth input has also been extensively studied. Methods like CLIPort [3] and IFOR [1] directly process the RGB-D images for object manipulation, and hence are limited to simple pickand-place tasks in 2D top-down settings. To overcome this issue, explicit 3D representations such as point clouds have been utilized.…”
Section: Related Workmentioning
confidence: 99%
“…Learning a single model for many different tasks has been of particular interest to the robotics community recently. A large volume of work achieves the multi-task generalization by using a generalizable task or action representation such as object point cloud [18,19], semantic segmentation and optical flow [1], and object-centric representation [29,30]. However, the limited expressiveness of such representations constrains them to only generalize within a task category.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In our approach, we use unknown object instance segmentation to break our scene up into objects, as per prior work [6], [7], [8], [9]. Then, we use a multi-modal transformer to combine both word tokens and object encodings from Point Cloud Transformer [10] in order to make 6-DoF goal pose predictions.…”
Section: Introductionmentioning
confidence: 99%