2023
DOI: 10.48550/arxiv.2302.02408
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-View Masked World Models for Visual Robotic Manipulation

Abstract: Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints and then learn a world model operating on the representations from the autoencoder. We dem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 31 publications
0
1
0
Order By: Relevance
“…Furthermore, Akinola et al [23] presented several multi-view approaches to robot learning of precise tasks with reinforcement learning. In the work of Younggyo Seo et al [25], a reinforcement learning framework for learning multi-view representations was proposed, which utilizes multi-view masked autoencoders for a variety of visual robotic manipulation scenarios. Although the aforementioned works have utilized multiple cameras in robotic manipulation systems, to the best of the authors' knowledge, this paper is the first research to tackle GCRL with the utilization of multiple camera views in robot manipulation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, Akinola et al [23] presented several multi-view approaches to robot learning of precise tasks with reinforcement learning. In the work of Younggyo Seo et al [25], a reinforcement learning framework for learning multi-view representations was proposed, which utilizes multi-view masked autoencoders for a variety of visual robotic manipulation scenarios. Although the aforementioned works have utilized multiple cameras in robotic manipulation systems, to the best of the authors' knowledge, this paper is the first research to tackle GCRL with the utilization of multiple camera views in robot manipulation tasks.…”
Section: Related Workmentioning
confidence: 99%