Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

Ohnishi, Shota; Uchibe, Eiji; Yamaguchi, Yotaro; Nakanishi, Kosuke; Yasui, Yoshiaki; Ishii, Shin

doi:10.3389/fnbot.2019.00103

Cited by 48 publications

(42 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Q-learning, as a very classical algorithm in RL, is a good example to understand the purpose of DRL. The big issue with Q-learning falls into the tabular method, which means that when state and action spaces are very large, it cannot build a very large Q table to store a large number of Q values [35] . Besides, it counts and iterates Q values based on past states.…”

Section: Deep Reinforcement Learningmentioning

confidence: 99%

Federated reinforcement learning: techniques, applications, and open challenges

Qi¹,

Zhou²,

Lei³

et al. 2021

View full text Add to dashboard Cite

This paper presents a comprehensive survey of Federated Reinforcement Learning (FRL), an emerging and promising field in Reinforcement Learning (RL). Starting with a tutorial of Federated Learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e., Horizontal Federated Reinforcement Learning (HFRL) and Vertical Federated Reinforcement Learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.

show abstract

Section: Deep Reinforcement Learningmentioning

confidence: 99%

Federated reinforcement learning: techniques, applications, and open challenges

Qi¹,

Zhou²,

Lei³

et al. 2021

View full text Add to dashboard Cite

show abstract

“…The Deep Q-Learning Network (DQN) is a way of modeling the environment and calculating the collision energy function, which is the main cause of a loss in functionality (Ohnishi et al, 2019 ). To realize the path planning process, the neural network is trained to minimize the loss function through the gradient descent method.…”

Section: Introductionmentioning

confidence: 99%

The Path Planning of Mobile Robot by Neural Networks and Hierarchical Reinforcement Learning

Liao

2020

Front. Neurorobot.

View full text Add to dashboard Cite

Existing mobile robots cannot complete some functions. To solve these problems, which include autonomous learning in path planning, the slow convergence of path planning, and planned paths that are not smooth, it is possible to utilize neural networks to enable to the robot to perceive the environment and perform feature extraction, which enables them to have a fitness of environment to state action function. By mapping the current state of these actions through Hierarchical Reinforcement Learning (HRL), the needs of mobile robots are met. It is possible to construct a path planning model for mobile robots based on neural networks and HRL. In this article, the proposed algorithm is compared with different algorithms in path planning. It underwent a performance evaluation to obtain an optimal learning algorithm system. The optimal algorithm system was tested in different environments and scenarios to obtain optimal learning conditions, thereby verifying the effectiveness of the proposed algorithm. Deep Deterministic Policy Gradient (DDPG), a path planning algorithm for mobile robots based on neural networks and hierarchical reinforcement learning, performed better in all aspects than other algorithms. Specifically, when compared with Double Deep Q-Learning (DDQN), DDPG has a shorter path planning time and a reduced number of path steps. When introducing an influence value, this algorithm shortens the convergence time by 91% compared with the Q-learning algorithm and improves the smoothness of the planned path by 79%. The algorithm has a good generalization effect in different scenarios. These results have significance for research on guiding, the precise positioning, and path planning of mobile robots.

show abstract

“…Therefore, the study by Pohlen et al [15] was considered to alleviate the instability of the learning process. Ohnishi et al [16] proposed constrained DQN to behave in two different methods: when the difference between the maximum value of the Q-function and the value of the target network is large, constrained DQN updates the Q-function more conservatively, and when this difference is small, constrained DQN behaves similar to that of conventional standard Q-learning. Studies [14][15][16] provide a family of target-based TD-learning algorithms [17].…”

mentioning

confidence: 99%

“…Ohnishi et al [16] proposed constrained DQN to behave in two different methods: when the difference between the maximum value of the Q-function and the value of the target network is large, constrained DQN updates the Q-function more conservatively, and when this difference is small, constrained DQN behaves similar to that of conventional standard Q-learning. Studies [14][15][16] provide a family of target-based TD-learning algorithms [17]. Study [17] showed that the success of deep Q-learning is indispensable to use a separate target network to improve the performance of Q-learning, and provided insight into the theoretical approaches, and introduced three different update methods: averaging TD, double TD, and periodic TD, where the target network is updated in an averaging, symmetric, or periodic manner, respectively.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Temporal Consistency-Based Loss Function for Both Deep Q-Networks and Deep Deterministic Policy Gradients for Continuous Actions

Kim

2021

Symmetry

View full text Add to dashboard Cite

Artificial intelligence (AI) techniques in power grid control and energy management in building automation require both deep Q-networks (DQNs) and deep deterministic policy gradients (DDPGs) in deep reinforcement learning (DRL) as off-policy algorithms. Most studies on improving the stability of DRL have addressed these with replay buffers and a target network using a delayed temporal difference (TD) backup, which is known for minimizing a loss function at every iteration. The loss functions were developed for DQN and DDPG, and it is well-known that there have been few studies on improving the techniques of the loss functions used in both DQN and DDPG. Therefore, we modified the loss function based on a temporal consistency (TC) loss and adapted the proposed TC loss function for the target network update in both DQN and DDPG. The proposed TC loss function showed effective results, particularly in a critic network in DDPG. In this work, we demonstrate that, in OpenAI Gym, both “cart-pole” and “pendulum”, the proposed TC loss function shows enormously improved convergence speed and performance, particularly in the critic network in DDPG.

show abstract

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

Cited by 48 publications

References 15 publications

Federated reinforcement learning: techniques, applications, and open challenges

Federated reinforcement learning: techniques, applications, and open challenges

The Path Planning of Mobile Robot by Neural Networks and Hierarchical Reinforcement Learning

Temporal Consistency-Based Loss Function for Both Deep Q-Networks and Deep Deterministic Policy Gradients for Continuous Actions

Contact Info

Product

Resources

About