2021 IEEE International Conference on Robotics and Automation (ICRA) 2021
DOI: 10.1109/icra48506.2021.9561903
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 19 publications
0
13
0
Order By: Relevance
“…6). Unlike local optimization [24,25,26], learningbased [28,29], or CBF-based [30] approaches, our proposed approach provides completeness guarantees under modest assumptions.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…6). Unlike local optimization [24,25,26], learningbased [28,29], or CBF-based [30] approaches, our proposed approach provides completeness guarantees under modest assumptions.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we acknowledge the recent trend of attempting to avoid the NP-hardness of temporal logic motion planning altogether by providing approximate solutions via non-convex optimization [24,25,26,27], learning [28,29], or control barrier functions [30,31]. While such approaches can be extremely efficient and may be practical for some applications, they offer limited or no completeness guarantees and rarely scale to very complex specifications like that shown in Fig.…”
Section: Related Workmentioning
confidence: 99%
“…For example, some research works employed LTL formulas to specify the instructions for a control agent to learn optimal strategies, i.e., optimal policies. Specifically, many works [11][12][13][14][15] designed automaton-based rewards so that model-free RL agents could find the optimal policies satisfying LTL specifications with probabilistic guarantees. However, none of them addresses the critical safety issues during training.…”
Section: Safe Reinforcement Learning Under Temporal Logic With Reward...mentioning
confidence: 99%
“…Specifications have been shown to be effective at directing RL agents to learn desired policies. Typically within the literature, a discrete robot system is abstracted as a discrete Markov Decision Process (MDP) model and composed with an automata representing the desired LTL formula to create a product automaton for learning or planning [5]- [8]. This approach has been extended to continuous systems [9], [10], where the LTL formulas are only defined over finite horizons.…”
Section: Introductionmentioning
confidence: 99%