2023
DOI: 10.3233/faia230598
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning by Guided Safe Exploration

Qisong Yang,
Thiago D. Simão,
Nils Jansen
et al.

Abstract: Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 39 publications
0
4
0
Order By: Relevance
“…In the target tasks, the extrinsic environment reward is revealed to the agent. We leverage the safe exploration policy to guide learning in the off-policy safe guide (SaGui; Yang et al 2022a) framework, which achieves safe transfer learning by two mechanisms: i) Adaptively regularize the student policy to the guide policy based on the student's safety; ii) Use the safe exploration policy as a recovery policy when the student starts to take unsafe actions.…”
Section: Evaluation Of Safe Transfer Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…In the target tasks, the extrinsic environment reward is revealed to the agent. We leverage the safe exploration policy to guide learning in the off-policy safe guide (SaGui; Yang et al 2022a) framework, which achieves safe transfer learning by two mechanisms: i) Adaptively regularize the student policy to the guide policy based on the student's safety; ii) Use the safe exploration policy as a recovery policy when the student starts to take unsafe actions.…”
Section: Evaluation Of Safe Transfer Learningmentioning
confidence: 99%
“…By maximizing the policy entropy, the agent trained by SAC-λ tends to have diverse behaviors, but it does not imply efficient exploration of the environment. With an additional intrinsic reward, the exploration of SAC-λ can be enhanced (Yang et al 2022a), but the interpretability of the learned policy in exploration is not clear. Achiam et al (2017); Liu, Ding, and Liu (2020); Yang et al (2020) propose a series of constrained policy optimization methods, where the constraints are built on longterm costs instead of real costs within a finite horizon.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Using formal methods to analyse complex systems carefully has become a trend in robotics. For example, Yang et al suggested a technique called guided and safe reinforcement learning [66], and Pek et al use something called Spatio-Temporal Logic to plan and keep an eye on complex robot tasks [51]. Our work takes a different approach to planning how autonomous systems move.…”
Section: Energy-efficient Motion Planning Of Autonomous Vehiclesmentioning
confidence: 99%