Deep predictive policy training using reinforcement learning

Ghadirzadeh, Ali; Maki, Atsuto; Kragić, Danica; Björkman, Mårten

doi:10.1109/iros.2017.8206046

Cited by 109 publications

(105 citation statements)

References 31 publications

Supporting

Mentioning

105

Contrasting

Order By: Relevance

“…Another approach is domain adaptation from the simulator [8,12,18,39,7,52], since it may be easier to fine tune from a simulator policy than training in the real world. However if the simulator differs from the real world, the policy trained in simulation can perform very poorly in the real world and fine tuning may not be any easier than training from scratch.…”

Section: Transfer From Simulation To the Real Worldmentioning

confidence: 99%

See 1 more Smart Citation

Asymmetric Actor Critic for Image-Based Robot Learning

Pinto¹,

Andrychowicz²,

Welinder³

et al. 2018

Robotics: Science and Systems XIV

184

160

View full text Add to dashboard Cite

Abstract-Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we propose the Asymmetric Actor Critic, which learns a vision-based control policy while taking advantage of access to the underlying state to significantly speed up training. Concretely, our algorithm employs an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) is trained on images. We show that using these asymmetric inputs improves performance on a range of simulated tasks. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real-world transfer without training on any real-world data. Videos of these experiments can be found in www.goo.gl/b57WTs.

show abstract

Section: Transfer From Simulation To the Real Worldmentioning

confidence: 99%

“…These physical robots could also damage themselves and their environment while exploring these behaviours. A recent approach to circumvent these challenges is to train on a simulated version of the robot and then transfer the learned policy to the real robot [8,12,18,39,7,52,40,13].…”

Section: Introductionmentioning

confidence: 99%

Asymmetric Actor Critic for Image-Based Robot Learning

Pinto¹,

Andrychowicz²,

Welinder³

et al. 2018

Robotics: Science and Systems XIV

184

160

View full text Add to dashboard Cite

show abstract

“…2) Ground truth 2D image coordinates (GT-2D): z is similar to the previous baseline, but the points are projected into the camera using the ground truth camera parameters. 3) Autoencoder (AE): z is the encoding of a pre-trained autoencoder, similar to the visual training in [2], [11]. 4) End-to-End (E2E): z is the intermediate representation from end-to-end training.…”

Section: A Simulation Experimental Setupmentioning

confidence: 99%

“…Note that the two vision-based baselines AE and E2E share an identical model architecture for producing z, and differ only in the method used to train the parameters. The model is close to [1]- [3] with the key architectural traits of having a few convolutional layers followed by a channel-wise spatial expectation operation, which has been widely used [10], [11], [25], [29], [36]- [38]. Most methods we compare (AE, E2E, DD-2D) use only one RGB camera stream as input to learned policies; DD-3D additionally uses the depth image.…”

Section: A Simulation Experimental Setupmentioning

confidence: 99%

Self-Supervised Correspondence in Visuomotor Policy Learning

Florence

Manuelli

Tedrake

2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense visual correspondence training, and show this enables visuomotor policy learning with surprisingly high generalization performance with modest amounts of data: using imitation learning, we demonstrate extensive hardware validation on challenging manipulation tasks with as few as 50 demonstrations. Our learned policies can generalize across classes of objects, react to deformable object configurations, and manipulate textureless symmetrical objects in a variety of backgrounds, all with closedloop, real-time vision-based policies. Simulated imitation learning experiments suggest that correspondence training offers sample complexity and generalization benefits compared to autoencoding and end-to-end training.

show abstract

“…Each PC/BC-DIM stage functions like a spatial auto-encoder which acted as an encoder for the transformation in one direction whereas the same PC/BC-DIM stage acted as a decoder for the transformation in the opposite direction. The PC/BC-DIM auto-encoder thus has a profound difference compared to all previously reported work for similar eyes-hand coordination tasks where the auto-encoder employed separate encoder and decoder neural circuitry [36][37][38][39].…”

Section: Pc/bc-dim Spatial Auto-encodermentioning

confidence: 99%

A neural model for eye–head–arm coordination

Muhammad

Spratling

2017

Advanced Robotics

View full text Add to dashboard Cite

The coordinated movement of the eyes, the head and the arm is an important ability in both animals and humanoid robots. To achieve this the brain and the robot control system need to be able to perform complex non-linear sensory-motor transformations in the forward and inverse directions between many degrees of freedom. In this article, we apply an omni-directional basis function neural network to this task. The proposed network can perform 3-D coordinated gaze shifts and 3-D arm reaching movements to a visual target. Particularly, it can perform direct sensory-motor transformations to shift gaze and to execute arm reach movements and can also perform inverse sensory-motor transformations in order to shift gaze to view the hand.

show abstract

Deep predictive policy training using reinforcement learning

Cited by 109 publications

References 31 publications

Asymmetric Actor Critic for Image-Based Robot Learning

Asymmetric Actor Critic for Image-Based Robot Learning

Self-Supervised Correspondence in Visuomotor Policy Learning

A neural model for eye–head–arm coordination

Contact Info

Product

Resources

About