2020
DOI: 10.48550/arxiv.2003.01303
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Abstract: Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…For example, Joshua Achiam et al's CPO algorithm is specifically designed for handling constrained problems in RL, ensuring that the optimized policy adheres to a set of predefined safety or other types of constraints [26]. Lu Wen et al [48] proposed Parallel Constrained Policy Optimization (PCPO), which uses synchronous parallel learners to explore different state spaces while ensuring safety, thereby accelerating learning and policy updates. Xu et al [49] introduced a Constrained Penalty Q-learning (CPQ) algorithm that enforces constraints by penalizing the Q-function for violations, learning robust policies that outperform several baselines.…”
Section: Research On Safe Reinforcement Learningmentioning
confidence: 99%
“…For example, Joshua Achiam et al's CPO algorithm is specifically designed for handling constrained problems in RL, ensuring that the optimized policy adheres to a set of predefined safety or other types of constraints [26]. Lu Wen et al [48] proposed Parallel Constrained Policy Optimization (PCPO), which uses synchronous parallel learners to explore different state spaces while ensuring safety, thereby accelerating learning and policy updates. Xu et al [49] introduced a Constrained Penalty Q-learning (CPQ) algorithm that enforces constraints by penalizing the Q-function for violations, learning robust policies that outperform several baselines.…”
Section: Research On Safe Reinforcement Learningmentioning
confidence: 99%
“…The prediction model masks unsafe actions to improve the safety performance of an intelligent vehicle. [20] proposes a method extending actor-critic frame with an additional risk network to estimate the safety constraint of current policy, while brings a substantial improvement in safety performance. [21] proposes a method to explicitly define a safety constraint in a certain RL environment, and uses a first-order model to estimate the constraint value under an action distribution.…”
Section: Safe Explorationmentioning
confidence: 99%
“…The literature on the safe design of ML-based controllers for dynamical and hybrid systems can be classified according to three broad approaches, namely (i) incorporating safety in the training of ML-based controllers, (ii) post-training verification of ML-based controllers, and (iii) online validation of safety and control intervention. Representative examples of the first approach include reward-shaping [1], Bayesian and robust regression [2], [3], [4], and policy optimization with constraints [5], [6], [7], [8]. Unfortunately, this approach does not provide provable guarantees on the safety of the trained controller.…”
Section: Introductionmentioning
confidence: 99%