Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Wen, Lijie; Duan, Jingliang; Li, Shengbo Eben; Xu, Shaobing; Peng, Huei

doi:10.48550/arxiv.2003.01303

Cited by 3 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, Joshua Achiam et al's CPO algorithm is specifically designed for handling constrained problems in RL, ensuring that the optimized policy adheres to a set of predefined safety or other types of constraints [26]. Lu Wen et al [48] proposed Parallel Constrained Policy Optimization (PCPO), which uses synchronous parallel learners to explore different state spaces while ensuring safety, thereby accelerating learning and policy updates. Xu et al [49] introduced a Constrained Penalty Q-learning (CPQ) algorithm that enforces constraints by penalizing the Q-function for violations, learning robust policies that outperform several baselines.…”

Section: Research On Safe Reinforcement Learningmentioning

confidence: 99%

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Zhao,

Chen,

Fan

et al. 2024

Sensors

View full text Add to dashboard Cite

Reinforcement Learning (RL) methods are regarded as effective for designing autonomous driving policies. However, even when RL policies are trained to convergence, ensuring their robust safety remains a challenge, particularly in long-tail data. Therefore, decision-making based on RL must adequately consider potential variations in data distribution. This paper presents a framework for highway autonomous driving decisions that prioritizes both safety and robustness. Utilizing the proposed Replay Buffer Constrained Policy Optimization (RECPO) method, this framework updates RL strategies to maximize rewards while ensuring that the policies always remain within safety constraints. We incorporate importance sampling techniques to collect and store data in a Replay buffer during agent operation, allowing the reutilization of data from old policies for training new policy models, thus mitigating potential catastrophic forgetting. Additionally, we transform the highway autonomous driving decision problem into a Constrained Markov Decision Process (CMDP) and apply our proposed RECPO for training, optimizing highway driving policies. Finally, we deploy our method in the CARLA simulation environment and compare its performance in typical highway scenarios against traditional CPO, current advanced strategies based on Deep Deterministic Policy Gradient (DDPG), and IDM + MOBIL (Intelligent Driver Model and the model for minimizing overall braking induced by lane changes). The results show that our framework significantly enhances model convergence speed, safety, and decision-making stability, achieving a zero-collision rate in highway autonomous driving.

show abstract

Section: Research On Safe Reinforcement Learningmentioning

confidence: 99%

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Zhao,

Chen,

Fan

et al. 2024

Sensors

View full text Add to dashboard Cite

show abstract

“…The prediction model masks unsafe actions to improve the safety performance of an intelligent vehicle. [20] proposes a method extending actor-critic frame with an additional risk network to estimate the safety constraint of current policy, while brings a substantial improvement in safety performance. [21] proposes a method to explicitly define a safety constraint in a certain RL environment, and uses a first-order model to estimate the constraint value under an action distribution.…”

Section: Safe Explorationmentioning

confidence: 99%

Multi-task Safe Reinforcement Learning for Navigating Intersections in Dense Traffic

Liu¹,

Zhang²,

Zhao³

2022

Preprint

View full text Add to dashboard Cite

Multi-task intersection navigation including the unprotected turning left, turning right, and going straight in dense traffic is still a challenging task for autonomous driving. For the human driver, the negotiation skill with other interactive vehicles is the key to guarantee safety and efficiency. However, it is hard to balance the safety and efficiency of the autonomous vehicle for multi-task intersection navigation. In this paper, we formulate a multi-task safe reinforcement learning with social attention to improve the safety and efficiency when interacting with other traffic participants. Specifically, the social attention module is used to focus on the states of negotiation vehicles. In addition, a safety layer is added to the multi-task reinforcement learning framework to guarantee safe negotiation. We compare the experiments in the simulator SUMO with abundant traffic flows and CARLA with high-fidelity vehicle models, which both show that the proposed algorithm can improve safety with consistent traffic efficiency for multi-task intersection navigation.

show abstract

“…The literature on the safe design of ML-based controllers for dynamical and hybrid systems can be classified according to three broad approaches, namely (i) incorporating safety in the training of ML-based controllers, (ii) post-training verification of ML-based controllers, and (iii) online validation of safety and control intervention. Representative examples of the first approach include reward-shaping [1], Bayesian and robust regression [2], [3], [4], and policy optimization with constraints [5], [6], [7], [8]. Unfortunately, this approach does not provide provable guarantees on the safety of the trained controller.…”

Section: Introductionmentioning

confidence: 99%

Provably Correct Training of Neural Network Controllers Using Reachability Analysis

Sun,

Shoukry

2021

Preprint

View full text Add to dashboard Cite

In this paper, we consider the problem of training neural network (NN) controllers for cyber-physical systems (CPS) that are guaranteed to satisfy safety and liveness properties. Our approach is to combine model-based design methodologies for dynamical systems with data-driven approaches to achieve this target. Given a mathematical model of the dynamical system, we compute a finite-state abstract model that captures the closedloop behavior under all possible neural network controllers. Using this finite-state abstract model, our framework identifies the subset of NN weights that are guaranteed to satisfy the safety requirements. During training, we augment the learning algorithm with a NN weight projection operator that enforces the resulting NN to be provably safe. To account for the liveness properties, the proposed framework uses the finitestate abstract model to identify candidate NN weights that may satisfy the liveness properties. Using such candidate NN weights, the proposed framework biases the NN training to achieve the liveness specification. Achieving the guarantees above, can not be ensured without correctness guarantees on the NN architecture, which controls the NN's expressiveness. Therefore, and as a corner step in the proposed framework is the ability to select provably correct NN architectures automatically.

show abstract

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Cited by 3 publications

References 18 publications

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Multi-task Safe Reinforcement Learning for Navigating Intersections in Dense Traffic

Provably Correct Training of Neural Network Controllers Using Reachability Analysis

Contact Info

Product

Resources

About