Alleviating parameter-tuning burden in reinforcement learning for large-scale process control

Zhu, Lingwei; Takami, Go; Kawahara, Mizuo; Kanokogi, Hiroaki; Matsubara, Takamitsu

doi:10.1016/j.compchemeng.2022.107658

Cited by 14 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Theoretical research has demonstrated that the relative entropy term implicitly averages the error of the approximated value function with state-of-theart error dependency [18], [19]. In terms of applications, this characteristic also contributes to superior sample efficiency and learning capability in a wide range of engineering tasks from robot control [20], [21] to chemical platform optimization [22], [23] where the agents efficiently explore the target tasks with a limited number of interactions using smoothly updated policies. However, despite the promising results in practical, the current works are mainly limited in tasks with discrete actions while directly extending the relative entropy regularization to the DDPG-like RL approaches with continuous action space is tricky: the relative entropy regularization in DPP requires traversal softmax operations over the entire discrete action space which is intractable to the continuous actions under the AC structure of DDPG.…”

Section: Cdpp (Ours)mentioning

confidence: 99%

Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions

Shang,

Li,

Zheng

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

In this paper, a novel reinforcement learning (RL) approach, continuous dynamic policy programming (CDPP) is proposed to tackle the issues of both learning stability and sample efficiency in the current RL methods with continuous actions. The proposed method naturally extends the relative entropy regularization from the value function-based framework to the actor-critic (AC) framework of deep deterministic policy gradient (DDPG) to stabilize the learning process in continuous action space. It tackles the intractable softmax operation over continuous actions in the critic by Monte Carlo estimation and explores the practical advantages of the Mellowmax operator. A Boltzmann sampling policy is proposed to guide the exploration of actor following the relative entropy regularized critic. Evaluated by several benchmark tasks, the proposed method clearly illustrates the positive impact of the relative entropy regularization including efficient exploration behavior and stable policy update in RL with continuous action space and successfully outperforms the related baseline approach in both sample efficiency and learning stability.

show abstract

Section: Cdpp (Ours)mentioning

confidence: 99%

Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions

Shang,

Li,

Zheng

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

“…Pan et al 300 present a reinforcement learning control approach that can handle nonlinear stochastic optimal control problems and has the potential of meeting state constraints. Some recent advances in reinforcement learning have to do with boosting the performance of such algorithms as discussed in Zhu et al 301 and with the leverage of reinforcement learning for the tuning of PID controllers as shown in Dogrua et al 302 In Schwung et al, 303 the reinforcement learning task is speed-up by deploying programmable logic controller information. Moreover, recent reinforcement control strategies aimed to batch control can be found elsewhere (Ma et al, 304 Kim et al, 305 Yoo et al, 306 Joshi et al, 307 and Mowbray et al 308 ).…”

Section: Reinforcement Learning Algorithmsmentioning

confidence: 99%

Machine Learning Algorithms Used in PSE Environments: A Didactic Approach and Critical Perspective

Fuentes-Cortés

Flores‐Tlacuahuac

Nigam

2022

Ind. Eng. Chem. Res.

View full text Add to dashboard Cite

This work addresses recent developments for solving problems in process systems engineering based on machine learning algorithms. A general description of most popular supervised and unsupervised learning algorithms is presented, as well as the applications addressed in the current literature. Because of their wide usage and potential applications, support vector machines and neural networks are addressed as special cases. The approach used is fundamentally didactic. Therefore, several of the references included are recommendations for novice readers interested in entering the area of machine learning and data science. The applications were selected considering simplicity, popularity of the application, and accessibility for inexperienced readers, but with knowledge of the process systems engineering area. Finally, a critical perspective for future development and applications is provided. Epistemological issues and modeling limitations are discussed in order to analyze the real significance of data-driven strategies as well as a questioning of academic marketing in recent years.

show abstract

“…Such an RL framework regularized by KL divergence is theoretically proved to have the state-of-the-art error dependency as it implicitly averages over all previous action value functions and hence also averages errors according to [18], [19]. This characteristic contributed to the great data-efficiency in various challenging engineering applications from robot manipulation [20], [21] to chemical plant control [22], [23] where the agents quickly explored the task within limited number of interactions through smoothly updated policies.…”

Section: Approachmentioning

confidence: 99%

Efficient Distributional Reinforcement Learning with Kullback-Leibler Divergence Regularization

Li¹,

ZhiWei²,

Zheng³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback–Leibler divergence-regularized distributional RL (KLC51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KLC51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated by several benchmark tasks with different complexity, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and successfully demonstrates its significant superiority in both learning stability and data-efficiency compared with the related baseline approaches.

show abstract

Alleviating parameter-tuning burden in reinforcement learning for large-scale process control

Cited by 14 publications

References 24 publications

Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions

Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions

Machine Learning Algorithms Used in PSE Environments: A Didactic Approach and Critical Perspective

Efficient Distributional Reinforcement Learning with Kullback-Leibler Divergence Regularization

Contact Info

Product

Resources

About