2018 Annual American Control Conference (ACC) 2018
DOI: 10.23919/acc.2018.8431181
|View full text |Cite
|
Sign up to set email alerts
|

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Abstract: Reward engineering is an important aspect of reinforcement learning. Whether or not the users' intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often requires parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 45 publications
(43 citation statements)
references
References 16 publications
0
43
0
Order By: Relevance
“…where λ penalizes the trade-off between maximizing robustness to get the highest STL satisfaction and minimizing the associated cost. Assuming the cost function J is also smooth, a similar gradient ascent optimization can be used to solve the constrained nonlinear optimization problem (16).…”
Section: Control Using the Agm Robustnessmentioning
confidence: 99%
“…where λ penalizes the trade-off between maximizing robustness to get the highest STL satisfaction and minimizing the associated cost. Assuming the cost function J is also smooth, a similar gradient ascent optimization can be used to solve the constrained nonlinear optimization problem (16).…”
Section: Control Using the Agm Robustnessmentioning
confidence: 99%
“…A similar problem has been formulated and examined for (a broader range of) completely unknown system dynamics in the context of truncated linear temporal logic (TLTL), a language comparable to STL, by [13]. Therein, the goal was to find a policy that maximizes the expected robustness measure corresponding to a general TLTL task specification.…”
Section: Problem Formulationmentioning
confidence: 99%
“…Namely, this will be done by using the PPC control law introduced in section II-C to guide PI 2 for an increased rate of convergence and robustness to process noise. We also extend the PI 2 framework to allow optimizing system trajectories subject to STL tasks for general C(τ ) costs; task satisfaction is thus treated as a constraint instead of as the target of optimization, in contrast to [13]. So far, the approach applies to the range of system dynamics (1) and STL formulas (7) to which the discussed PPC control law is applicable.…”
Section: Problem Formulationmentioning
confidence: 99%
See 2 more Smart Citations