2016
DOI: 10.48550/arxiv.1610.00529
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Path Integral Guided Policy Search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2017
2017
2018
2018

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 0 publications
0
13
0
Order By: Relevance
“…We adopt the time-varying linear-Gaussian policies π θt = N (K t s t + k t , Σ t ) (here θ t = (k t , Σ t ) for t = 0, ..., T ) and weighted maximum-likelihood estimation to update the policy parameters (feedback gain K t is kept fixed to reduce the dimension of the parameter space). This approach has been used in [3]. The difference is that [3] recomputes p(τ i ) at each step t using cost-to-go before updating θ i .…”
Section: B Relative Entropy Policy Searchmentioning
confidence: 99%
See 2 more Smart Citations
“…We adopt the time-varying linear-Gaussian policies π θt = N (K t s t + k t , Σ t ) (here θ t = (k t , Σ t ) for t = 0, ..., T ) and weighted maximum-likelihood estimation to update the policy parameters (feedback gain K t is kept fixed to reduce the dimension of the parameter space). This approach has been used in [3]. The difference is that [3] recomputes p(τ i ) at each step t using cost-to-go before updating θ i .…”
Section: B Relative Entropy Policy Searchmentioning
confidence: 99%
“…This approach has been used in [3]. The difference is that [3] recomputes p(τ i ) at each step t using cost-to-go before updating θ i . Since a temporal logic reward (described in the next section) depends on the entire trajectory, it doesn't have the notion of cost-togo and can only be evaluated as a terminal reward.…”
Section: B Relative Entropy Policy Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…They exploited the spatial softmax layer, introduced in their earlier work [12], to convert the activation of the last layer of the convolutional filters into spatial image positions. This topology has been applied in a number of real robotic visuomotor learning tasks [12], [16]- [18]. 𝑢 𝑡:𝑡+𝑇 𝑎 𝑡 𝑠 𝑡 𝑜 𝑡 Fig.…”
Section: B Related Workmentioning
confidence: 99%
“…The input image is reconstructed based on this encoding i.e., based on the knowledge of where the relevant objects are located in the image. The encoding inherently preserves spatial distances in the input image and is therefore suitable for robotic manipulation tasks [6], [12], [16], [18].…”
Section: Representation Learningmentioning
confidence: 99%