An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Zhang, Huaqing; Ma, Hui; Jin, Ying

doi:10.1007/978-3-031-13841-6_41

Cited by 1 publication

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goal of RL is to learn strategies to maximize expectations in the Markov decision-making process (Zhang et al 2022) [32]. Markov process consists of a quintuple 𝐴 𝜋 𝑡 = {𝑆, 𝐴, 𝑃, 𝑅, 𝛾}, where 𝑆, 𝐴 represent the state space and action space, 𝑃 represents the transition probability between different states, and 𝑅 represents the reward set.…”

Section: A Introduction To the Cql Algorithmmentioning

confidence: 99%

Automatic Tracking Control Strategy of Autonomous Trains Considering Speed Restrictions: Using the Improved Offline Deep Reinforcement Learning Method

Liu,

Feng,

Xiao

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Previous research on automatic control of high-speed trains in speed limit sections is insufficient. This article proposes a new offline reinforcement learning strategy for automatic tracking of autonomous trains. Firstly, the operating speed and deceleration starting point were determined for different speed limit scenarios. Then, a tracking controller based on the improved offline conservative Q-learning (CQL) algorithm was designed to avoid frequent interaction between the train and the environment. Selected an appropriate policy to implement the CQL algorithm. The data samples were reclassified to increase sample concentration. The value and strategy network structure was redesigned. The state space and action space of tracking trains were limited, and the dimension of state space was increased. A multi-objective reward function was designed to distinguish the tracking process of trains in different sections. The simulation results show that the proposed high-speed railway tracking interval automatic control algorithm is superior to traditional online reinforcement learning methods in terms of safety, comfort, and convergence efficiency.

show abstract

Section: A Introduction To the Cql Algorithmmentioning

confidence: 99%