Path Integral Guided Policy Search

Chebotar, Yevgen; Kalakrishnan, Mrinal; Yahya, Ali Abdullah; Li, Adrian; Schaal, Stefan; Levine, Sergey

doi:10.48550/arxiv.1610.00529

Cited by 7 publications

(13 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We adopt the time-varying linear-Gaussian policies π θt = N (K t s t + k t , Σ t ) (here θ t = (k t , Σ t ) for t = 0, ..., T ) and weighted maximum-likelihood estimation to update the policy parameters (feedback gain K t is kept fixed to reduce the dimension of the parameter space). This approach has been used in [3]. The difference is that [3] recomputes p(τ i ) at each step t using cost-to-go before updating θ i .…”

Section: B Relative Entropy Policy Searchmentioning

confidence: 99%

“…This approach has been used in [3]. The difference is that [3] recomputes p(τ i ) at each step t using cost-to-go before updating θ i . Since a temporal logic reward (described in the next section) depends on the entire trajectory, it doesn't have the notion of cost-togo and can only be evaluated as a terminal reward.…”

Section: B Relative Entropy Policy Searchmentioning

confidence: 99%

“…In our view, a formal language for RL task specification should have the following characteristics: (1) The language should be defined over predicates so tasks can be conveniently specified as functions of states (2) The language should provide quantitative semantics as a continous measure of its satisfaction. (3) The specification formula should be evaluated over finite sequences (state trajectories) of variable length, thus allow for per-step evaluation on currently available data. (4) Temporal operators can have time bounds but should not require them.…”

Section: B Comparison With Existing Formal Languagesmentioning

confidence: 99%

See 2 more Smart Citations

Reinforcement learning with temporal logic rewards

Vasile

Belta

2017

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

131

115

View full text Add to dashboard Cite

Abstract-Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the desired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as specifications language, that is arguably well suited for the robotics applications, together with quantitative semantics, i.e., robustness degree. We propose a RL approach to learn tasks expressed as TLTL formulae that uses their associated robustness degree as reward functions, instead of the manually crafted heuristics trying to capture the same specifications. We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toast-placing task learned by a Baxter robot.

show abstract

Section: B Relative Entropy Policy Searchmentioning

confidence: 99%

Section: B Relative Entropy Policy Searchmentioning

confidence: 99%

Section: B Comparison With Existing Formal Languagesmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement learning with temporal logic rewards

Vasile

Belta

2017

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

131

115

View full text Add to dashboard Cite

show abstract

“…They exploited the spatial softmax layer, introduced in their earlier work [12], to convert the activation of the last layer of the convolutional filters into spatial image positions. This topology has been applied in a number of real robotic visuomotor learning tasks [12], [16]- [18]. 𝑢 𝑡:𝑡+𝑇 𝑎 𝑡 𝑠 𝑡 𝑜 𝑡 Fig.…”

Section: B Related Workmentioning

confidence: 99%

“…The input image is reconstructed based on this encoding i.e., based on the knowledge of where the relevant objects are located in the image. The encoding inherently preserves spatial distances in the input image and is therefore suitable for robotic manipulation tasks [6], [12], [16], [18].…”

Section: Representation Learningmentioning

confidence: 99%

Deep predictive policy training using reinforcement learning

Ghadirzadeh

Maki

Kragić

et al. 2017

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

109

View full text Add to dashboard Cite

Skilled robot task learning is best implemented by predictive action policies due to the inherent latency of sensorimotor processes. However, training such predictive policies is challenging as it involves finding a trajectory of motor activations for the full duration of the action. We propose a data-efficient deep predictive policy training (DPPT) framework with a deep neural network policy architecture which maps an image observation to a sequence of motor activations. The architecture consists of three sub-networks referred to as the perception, policy and behavior super-layers. The perception and behavior super-layers force an abstraction of visual and motor data trained with synthetic and simulated training samples, respectively. The policy super-layer is a small subnetwork with fewer parameters that maps data in-between the abstracted manifolds. It is trained for each task using methods for policy search reinforcement learning. We demonstrate the suitability of the proposed architecture and learning framework by training predictive policies for skilled object grasping and ball throwing on a PR2 robot. The effectiveness of the method is illustrated by the fact that these tasks are trained using only about 180 real robot attempts with qualitative terminal rewards.

show abstract

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

Holly

Lillicrap

et al. 2017

2017 IEEE International Conference on Robotics and Automation (ICRA)

Self Cite

1,279

793

View full text Add to dashboard Cite

Abstract-Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on offpolicy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.

show abstract

Path Integral Guided Policy Search

Cited by 7 publications

References 0 publications

Reinforcement learning with temporal logic rewards

Reinforcement learning with temporal logic rewards

Deep predictive policy training using reinforcement learning

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

Contact Info

Product

Resources

About