2017 IEEE International Conference on Robotics and Automation (ICRA) 2017
DOI: 10.1109/icra.2017.7989384
|View full text |Cite
|
Sign up to set email alerts
|

Path integral guided policy search

Abstract: Abstract-We present a policy search method for learning complex feedback control policies that map from highdimensional sensory inputs to motor torques, for manipulation tasks with discontinuous contact dynamics. We build on a prior technique called guided policy search (GPS), which iteratively optimizes a set of local policies for specific instances of a task, and uses these to train a complex, high-dimensional global policy that generalizes across task instances. We extend GPS in the following ways: (1) we p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
122
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(122 citation statements)
references
References 24 publications
0
122
0
Order By: Relevance
“…Note that the two vision-based baselines AE and E2E share an identical model architecture for producing z, and differ only in the method used to train the parameters. The model is close to [1]- [3] with the key architectural traits of having a few convolutional layers followed by a channel-wise spatial expectation operation, which has been widely used [10], [11], [25], [29], [36]- [38]. Most methods we compare (AE, E2E, DD-2D) use only one RGB camera stream as input to learned policies; DD-3D additionally uses the depth image.…”
Section: A Simulation Experimental Setupmentioning
confidence: 99%
“…Note that the two vision-based baselines AE and E2E share an identical model architecture for producing z, and differ only in the method used to train the parameters. The model is close to [1]- [3] with the key architectural traits of having a few convolutional layers followed by a channel-wise spatial expectation operation, which has been widely used [10], [11], [25], [29], [36]- [38]. Most methods we compare (AE, E2E, DD-2D) use only one RGB camera stream as input to learned policies; DD-3D additionally uses the depth image.…”
Section: A Simulation Experimental Setupmentioning
confidence: 99%
“…Figure 1 illustrates our proposed GPS based framework for sequential multi-task learning. The local policy p i is generally optimized with iterative linear-quadratic regulators (iLQR) [5] or the path integrals (P I 2 ) method [11]. The global policy π θi usually adopts a deep neural network to represent a broad range of behaviors.…”
Section: A a Gps Based Framework For Sequential Multi-task Learningmentioning
confidence: 99%
“…It provides a total bound for global policy cost and an appropriate step size to enhance global policy. Recently, Chebotar et al [11] extended this to a global policy sampling scheme, and introduced a KLconstrained path integrals (P I 2 ) approach. This enhanced its generalization capability, by increasing the diversity of training data.…”
Section: Introductionmentioning
confidence: 99%
“…For recording the position of the trash, a marker is attached on the object. The marker is tracked with Kinect RGB-D camera by using Robot Operating System (ROS) wrapper for Alvar, an open source augmented reality tag tracking library 1 . By using ROS a Server-Client communication interface is built between the CNN and the DMP as shown in Fig.…”
Section: A Experimental Setupmentioning
confidence: 99%
“…The amount of resources needed and the learning time makes the applicability of such an approach infeasible for real world scenarios. If an optimizer can provide trajectories for solving the manipulation task, then it can be used to guide the policy search of a CNN to a good local optima and to speed up the learning procedure [1], [7]. A drawback of [7] is that the task has to be first formulated as an optimization problem and requires the user to define a cost function for the executed actions.…”
Section: Introductionmentioning
confidence: 99%