Path integral guided policy search

Chebotar, Yevgen; Kalakrishnan, Mrinal; Yahya, Ali Abdullah; Li, Adrian; Schaal, Stefan; Levine, Sergey

doi:10.1109/icra.2017.7989384

Cited by 120 publications

(122 citation statements)

References 24 publications

Supporting

Mentioning

122

Contrasting

Order By: Relevance

“…Note that the two vision-based baselines AE and E2E share an identical model architecture for producing z, and differ only in the method used to train the parameters. The model is close to [1]- [3] with the key architectural traits of having a few convolutional layers followed by a channel-wise spatial expectation operation, which has been widely used [10], [11], [25], [29], [36]- [38]. Most methods we compare (AE, E2E, DD-2D) use only one RGB camera stream as input to learned policies; DD-3D additionally uses the depth image.…”

Section: A Simulation Experimental Setupmentioning

confidence: 99%

Self-Supervised Correspondence in Visuomotor Policy Learning

Florence

Manuelli

Tedrake

2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense visual correspondence training, and show this enables visuomotor policy learning with surprisingly high generalization performance with modest amounts of data: using imitation learning, we demonstrate extensive hardware validation on challenging manipulation tasks with as few as 50 demonstrations. Our learned policies can generalize across classes of objects, react to deformable object configurations, and manipulate textureless symmetrical objects in a variety of backgrounds, all with closedloop, real-time vision-based policies. Simulated imitation learning experiments suggest that correspondence training offers sample complexity and generalization benefits compared to autoencoding and end-to-end training.

show abstract

Section: A Simulation Experimental Setupmentioning

confidence: 99%

Self-Supervised Correspondence in Visuomotor Policy Learning

Florence

Manuelli

Tedrake

2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…Figure 1 illustrates our proposed GPS based framework for sequential multi-task learning. The local policy p i is generally optimized with iterative linear-quadratic regulators (iLQR) [5] or the path integrals (P I 2 ) method [11]. The global policy π θi usually adopts a deep neural network to represent a broad range of behaviors.…”

Section: A a Gps Based Framework For Sequential Multi-task Learningmentioning

confidence: 99%

“…It provides a total bound for global policy cost and an appropriate step size to enhance global policy. Recently, Chebotar et al [11] extended this to a global policy sampling scheme, and introduced a KLconstrained path integrals (P I 2 ) approach. This enhanced its generalization capability, by increasing the diversity of training data.…”

Section: Introductionmentioning

confidence: 99%

Guided Policy Search for Sequential Multitask Learning

Xiong

Sun

Yang

et al. 2019

IEEE Trans. Syst. Man Cybern, Syst.

View full text Add to dashboard Cite

Abstract-Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multi-task learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, elastic weight consolidation (EWC). Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multi-task, learning-guided policy search (SMT-GPS), is able to operate in sequential multi-task learning settings, ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially-arriving training samples, delivering comparable performance to the traditional, batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.Index Terms-Reinforcement learning, guided policy search, sequential multi-task learning, elastic weight consolidation.

show abstract

“…For recording the position of the trash, a marker is attached on the object. The marker is tracked with Kinect RGB-D camera by using Robot Operating System (ROS) wrapper for Alvar, an open source augmented reality tag tracking library 1 . By using ROS a Server-Client communication interface is built between the CNN and the DMP as shown in Fig.…”

Section: A Experimental Setupmentioning

confidence: 99%

“…The amount of resources needed and the learning time makes the applicability of such an approach infeasible for real world scenarios. If an optimizer can provide trajectories for solving the manipulation task, then it can be used to guide the policy search of a CNN to a good local optima and to speed up the learning procedure [1], [7]. A drawback of [7] is that the task has to be first formulated as an optimization problem and requires the user to define a cost function for the executed actions.…”

Section: Introductionmentioning

confidence: 99%

Learning deep movement primitives using convolutional neural networks

Pervez

Mao

Lee

2017

2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids)

View full text Add to dashboard Cite

Abstract-Dynamic Movement Primitives (DMPs) are widely used for encoding motion data. Task parameterized DMP (TP-DMP) can adapt a learned skill to different situations. Mostly a customized vision system is used to extract task specific variables. This limits the use of such systems to real world scenarios. This paper proposes a method for combining the DMP with a Convolutional Neural Network (CNN). Our approach preserves the generalization properties associated with a DMP, while the CNN learns the task specific features from the camera images. This eliminates the need to extract the task parameters, by directly utilizing the camera image during the motion reproduction. The performance of the developed approach is demonstrated through a trash cleaning task, executed with a real robot. We also show that by using the data augmentation, the learned sweeping skill can be generalized for arbitrary objects. The experiments show the robustness of our approach for several different settings.

show abstract

Path integral guided policy search

Cited by 120 publications

References 24 publications

Self-Supervised Correspondence in Visuomotor Policy Learning

Self-Supervised Correspondence in Visuomotor Policy Learning

Guided Policy Search for Sequential Multitask Learning

Learning deep movement primitives using convolutional neural networks

Contact Info

Product

Resources

About