2023
DOI: 10.1038/s41598-023-28582-4
|View full text |Cite
|
Sign up to set email alerts
|

Safe reinforcement learning under temporal logic with reward design and quantum action selection

Abstract: This paper proposes an advanced Reinforcement Learning (RL) method, incorporating reward-shaping, safety value functions, and a quantum action selection algorithm. The method is model-free and can synthesize a finite policy that maximizes the probability of satisfying a complex task. Although RL is a promising approach, it suffers from unsafe traps and sparse rewards and becomes impractical when applied to real-world problems. To improve safety during training, we introduce a concept of safety values, which re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 39 publications
0
1
0
Order By: Relevance
“…Lastly, Cai et al [23] presents a method for finding RL control policies that satisfy LTL specifications through safe reinforcement learning. The developed reward shaping process improves reward density and induces maximum probability of LTL satisfaction, while the safe padding technique maintains safe exploration without affecting the original probabilistic guarantees.…”
Section: Discussionmentioning
confidence: 99%
“…Lastly, Cai et al [23] presents a method for finding RL control policies that satisfy LTL specifications through safe reinforcement learning. The developed reward shaping process improves reward density and induces maximum probability of LTL satisfaction, while the safe padding technique maintains safe exploration without affecting the original probabilistic guarantees.…”
Section: Discussionmentioning
confidence: 99%