Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Zhu, Yuke; Wang, Ziyu; Merel, Josh; Rusu, Andrei; Erez, Tom; Cabi, Serkan; Tunyasuvunakool, Saran; Kramár, János; Hadsell, Raia; Freitas, Nando de; Heess, Nicolas

doi:10.15607/rss.2018.xiv.009

Cited by 201 publications

(120 citation statements)

References 49 publications

Supporting

Mentioning

118

Contrasting

Unclassified

Order By: Relevance

“…In these systems, an artificial agent must learn to produce action outcomes in response to information from the environment, including rewards. Several such artificial systems have used convolutional architectures on the front end in order to process visual information about the world (Figure 3) [72,73,74]. It would be interesting to compare the representations learned in the context of these models to those trained by other mechanisms, as well as to data.…”

Section: Alternative Training Proceduresmentioning

confidence: 99%

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

Lindsay

2021

Journal of Cognitive Neuroscience

407

241

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNS in vision research beyond basic object recognition. This is the author's final version. This article has been accepted for publication in the Journal ofCognitive Neuroscience.

show abstract

Section: Alternative Training Proceduresmentioning

confidence: 99%

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

Lindsay

2021

Journal of Cognitive Neuroscience

407

241

View full text Add to dashboard Cite

show abstract

“…Reinforcement learning approaches have recently been proposed to address variations in geometry and configuration for manipulation. [34,58] trained neural network policies using RGB images and proprioceptive feedback. Their approach works well in a wide range of tasks, but the large object clearances compared to automation tasks may explain the sufficiency of RGB data.…”

Section: A Contact-rich Manipulationmentioning

confidence: 99%

“…While the utility of multimodal data has frequently been shown in robotics [7,45,48,54], the proposed manipulation strategies are often task-specific. While learning-based methods do not require manual task specification, the majority of learned manipulation policies close the control loop around a single modality, often vision [15,21,34,58].…”

Section: Introductionmentioning

confidence: 99%

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Lee

Zhu

Srinivasan

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

Self Cite

265

122

View full text Add to dashboard Cite

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. We present results in simulation and on a real robot. * Authors have contributed equally and names are in alphabetical order.Authors are with the Department of Computer Science, Stanford University. [mishlee,yukez,krshna,pshah9,ssilvio,feifeili, animeshg,bohg]@stanford.edu. A. Garg is also at Nvidia, USA.

show abstract

“…Reinforcement learning approaches have recently been proposed to address variations in geometry and configuration for manipulation. [41,72] train neural network policies using RGB images and proprioceptive feedback. Their approach works well in a wide range of tasks, but the large object clearances compared to manufacturing tasks may explain the sufficiency of RGB data.…”

Section: A Contact-rich Manipulationmentioning

confidence: 99%

“…This makes many of these methods task-specific. On the other hand, most learning-based methods do not require manual task specification, yet the majority of learned manipulation policies close the control loop around a single modality, often RGB images [15,22,41,72].…”

Section: Introductionmentioning

confidence: 99%

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

et al. 2020

Self Cite

View full text Add to dashboard Cite

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is non-trivial to manually design a robot controller that combines these modalities which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. In this work, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot.

show abstract

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Cited by 201 publications

References 49 publications

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Contact Info

Product

Resources

About