Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm

Wang, Pin; Li, Hanhan; Chan, Ching-Yao

doi:10.1109/ivs.2019.8813903

Cited by 63 publications

(31 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Jaritz et al [35] mapped the RGB images from the front camera to the output actions and trained the agent with the Asynchronous Advantage Actor-Critic [36] algorithm to achieve fast convergence and stable driving. Wang et al [37] exploited DDPG to train the lane-changing behavior of the agent. For the first time, deep reinforcement learning is applied to an actual full-size self-driving vehicle, where the DDPG network takes the image information observed by the vehicle as input and it is trained with sparse reward [16].…”

Section: Related Workmentioning

confidence: 99%

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

et al. 2021

View full text Add to dashboard Cite

The research on autonomous driving based on deep reinforcement learning algorithms is a research hotspot. Traditional autonomous driving requires human involvement, and the autonomous driving algorithms based on supervised learning must be trained in advance using human experience. To deal with autonomous driving problems, this paper proposes an improved end-to-end deep deterministic policy gradient (DDPG) algorithm based on the convolutional block attention mechanism, and it is called multi-input attention prioritized deep deterministic policy gradient algorithm (MAPDDPG). Both the actor network and the critic network of the model have the same structure with symmetry. Meanwhile, the attention mechanism is introduced to help the vehicles focus on useful environmental information. The experiments are conducted in the open racing car simulator (TORCS)and the results of five experiment runs on the test tracks are averaged to obtain the final result. Compared with the state-of-the-art algorithm, the maximum reward increases from 62,207 to 116,347, and the average speed increases from 135 km/h to 193 km/h, while the number of success episodes to complete a circle increases from 96 to 147. Also, the variance of the distance from the vehicle to the center of the road is compared, and the result indicates that the variance of the DDPG is 0.6 m while that of the MAPDDPG is only 0.2 m. The above results indicate that the proposed MAPDDPG achieves excellent performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

et al. 2021

View full text Add to dashboard Cite

show abstract

“…A Deep Q Network (DQN) is a famous DRL algorithm that combines the Q-learning algorithm and neural networks in order to make better training stability and convergence [36]. Another algorithm used widly is Deep Deterministic Policy Gradient (DDPG), which uses a network to fit the policy function in terms of action output and directly outputs actions, coping with the output of continuous actions and a large action space [37]. The DRL algorithms are used to solve problems in various environments.…”

Section: Related Workmentioning

confidence: 99%

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Sun

Wang

Zhang

2021

Entropy

View full text Add to dashboard Cite

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

show abstract

“…Liu et al employed data obtained in a simulation environment and real driving data as the state input to the agent to train a neural network, update the network parameters by introducing supervisory loss, and make the agent learn as much as possible from the real data to improve the training process [ 23 ]. In [ 24 ], Wang et al used continuous action space and the DDPG reinforcement learning algorithm to study the lateral control of vehicle lane-changing behavior. An intelligent driver model was used for the longitudinal control, considering the relative speed and distance between the agent and the vehicle in front, and finally, a suitable acceleration for following the vehicle in front was determined.…”

Section: Related Workmentioning

confidence: 99%

End-to-End Automated Lane-Change Maneuvering Considering Driving Style Using a Deep Deterministic Policy Gradient Algorithm

Wang

et al. 2020

Sensors

View full text Add to dashboard Cite

Changing lanes while driving requires coordinating the lateral and longitudinal controls of a vehicle, considering its running state and the surrounding environment. Although the existing rule-based automated lane-changing method is simple, it is unsuitable for unpredictable scenarios encountered in practice. Therefore, using a deep deterministic policy gradient (DDPG) algorithm, we propose an end-to-end method for automated lane changing based on lidar data. The distance state information of the lane boundary and the surrounding vehicles obtained by the agent in a simulation environment is denoted as the state space for an automated lane-change problem based on reinforcement learning. The steering wheel angle and longitudinal acceleration are used as the action space, and both the state and action spaces are continuous. In terms of the reward function, avoiding collision and setting different expected lane-changing distances that represent different driving styles are considered for security, and the angular velocity of the steering wheel and jerk are considered for comfort. The minimum speed limit for lane changing and the control of the agent for a quick lane change are considered for efficiency. For a one-way two-lane road, a visual simulation environment scene is constructed using Pyglet. By comparing the lane-changing process tracks of two driving styles in a simplified traffic flow scene, we study the influence of driving style on the lane-changing process and lane-changing time. Through the training and adjustment of the combined lateral and longitudinal control of autonomous vehicles with different driving styles in complex traffic scenes, the vehicles could complete a series of driving tasks while considering driving-style differences. The experimental results show that autonomous vehicles can reflect the differences in the driving styles at the time of lane change at the same speed. Under the combined lateral and longitudinal control, the autonomous vehicles exhibit good robustness to different speeds and traffic density in different road sections. Thus, autonomous vehicles trained using the proposed method can learn an automated lane-changing policy while considering safety, comfort, and efficiency.

show abstract

Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm

Cited by 63 publications

References 14 publications

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

End-to-End Automated Lane-Change Maneuvering Considering Driving Style Using a Deep Deterministic Policy Gradient Algorithm

Contact Info

Product

Resources

About