Performance Analysis of Deep Q Networks and Advantage Actor Critic Algorithms in Designing Reinforcement Learning-based Self-tuning PID Controllers

Mukhopadhyay, R.; Bandyopadhyay, Soutrik; Sutradhar, Ashoke; Chattopadhyay, Paramita

doi:10.1109/ibssc47189.2019.8973068

Cited by 5 publications

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods try to find the optimal policy function without calculating the value function first. Using gradient-based algorithms such as Proximal Policy Optimization (PPO) [45] and Advantage Actor-Critic (A2C) [46], these methods update the policy parameters judging by their performance. In (3) the equation for updating the policy parameters in gradient ascent, one of the most popular methods is depicted.…”

Section: ) Policy-based Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A Model-Free Switching and Control Method for Three-Level Neutral Point Clamped Converter Using Deep Reinforcement Learning

Qashqai,

Babaie,

Zgheib

et al. 2023

IEEE Access

View full text Add to dashboard Cite

This paper presents a novel model-free switching and control method for three-level neutral point clamped (NPC) converter using deep reinforcement learning (DRL). In this method, voltage balancing, and selection of optimal switches are achieved using a reward function which is calculated based on various signals measured as observations of the DRL agent. Since the action space is discrete, a deep Q-network (DQN) agent is utilized. DQN is used due to its capability of handling high-dimensional state spaces. In order to highlight its pros and cons, the proposed method is compared with model predictive control (MPC), which is another popular non-linear control method for power electronic converters. The proposed method is evaluated and compared with the MPC method in grid-connected mode using simulations in Matlab/Simulink. To evaluate the practical performance of the DRL method, various experimental results are obtained. The simulation and experimental results demonstrate that the proposed method effectively achieves accurate voltage balancing and ensures steady operation even in the presence of various dynamic changes, including variations in the reference currents and grid voltage. Additionally, the method successfully handles uncertainties, such as sensor measurement noise, and accommodates parameter variations, such as changes in the capacity of the DC-link capacitors and line impedance. The results demonstrate that this method exhibits superior adaptability to real-time changes and uncertainties, delivering more robust performance compared to similar conventional methods like MPC. Thus, this method can be considered a promising approach for intelligent control of power electronic converters, especially when conventional methods such as MPC face challenges in performance and accuracy under severe parameter variations and uncertainties.

show abstract

Section: ) Policy-based Methodsmentioning

confidence: 99%

“…However, these methods require more computational power and memory due to utilizing two separate networks. They also suffer from instability, delayed reward, and extended training times due to the correlation problems between actors and critics [46]. The most popular DRL agent types supported by Matlab [50] are listed in Table 2.…”

Section: A Selecting Agent Typementioning

confidence: 99%