Multi-agent Battery Storage Management using MPC-based Reinforcement Learning

Kordabad, Arash Bahari; Cai, Wen-Qi; Gros, Sébastien

doi:10.1109/ccta48906.2021.9659202

Cited by 9 publications

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Model Predictive Control-Based Reinforcement Learning Using Expected Sarsa

2022

View full text Add to dashboard Cite

Recent studies have shown the potential of Reinforcement Learning (RL) algorithms in tuning the parameters of Model Predictive Controllers (MPC), including the weights of the cost function and unknown parameters of the MPC model. However, a framework for easy and straightforward implementation that allows training in just a few episodes and overcoming the need for imposing extra constraints as required by state-of-the-art methods, is still missing. In this study, we present two implementations to achieve these goals. In the first approach, a nonlinear MPC plays the role of a function approximator for an Expected Sarsa RL algorithm. In the second approach, only the MPC cost function is considered as the function approximator, while the unknown parameters of the MPC model are updated based on more classical system identification. In order to evaluate the performance of the proposed algorithms, first numerical simulations are performed on a coupled tanks system. Then, both algorithms are applied to the real system and their closed-loop performance and convergence speed are compared with each other. The results indicate that the proposed algorithms allow tuning of MPCs over very few episodes. Finally, also the disturbance rejection ability of the proposed methods is demonstrated.

show abstract

Model Predictive Control-Based Reinforcement Learning Using Expected Sarsa

2022

View full text Add to dashboard Cite

show abstract

Safe Reinforcement Learning Using Wasserstein Distributionally Robust MPC and Chance Constraint

2022

Self Cite

View full text Add to dashboard Cite

In this paper, we address the chance-constrained safe Reinforcement Learning (RL) problem using the function approximators based on Stochastic Model Predictive Control (SMPC) and Distributionally Robust Model Predictive Control (DRMPC). We use Conditional Value at Risk (CVaR) to measure the probability of constraint violation and safety. In order to provide a safe policy by construction, we first propose using parameterized nonlinear DRMPC at each time step. DRMPC optimizes a finite-horizon cost function subject to the worst-case constraint violation in an ambiguity set. We use a statistical ball around the empirical distribution with radius measured by the Wasserstein metric as the ambiguity set. Unlike the sample average approximation SMPC, DRMPC provides a probabilistic guarantee of the out of sample risk and requires lower samples from the disturbance. Then the Q-learning method is used to optimize the parameters in the DRMPC to achieve the best closed-loop performance. Wheeled Mobile Robot (WMR) path planning with obstacle avoidance will be considered to illustrate the efficiency of the proposed method.

show abstract

A Learning-Based Model Predictive Control Strategy for Home Energy Management Systems

Cai,

Sawant,

Reinhardt

et al. 2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

This paper presents a model predictive control (MPC)-based reinforcement learning (RL) approach for a home energy management system (HEMS). The house consists of an air-to-water heat pump connected to a hot water tank that supplies thermal energy to a water-based floor heating system. Additionally, it includes a photovoltaic (PV) array and a battery storage system. The HEMS is supposed to exploit the house thermal inertia and battery storage to shift demand from peak hours to off-peak periods and earn benefits by selling excess energy to the utility grid during periods of high electricity prices. However, designing such a HEMS is challenging because the discrepancies due to model mismatch make erroneous predictions of the system dynamics, leading to a non-optimal decision making. Besides, uncertainties in the house thermodynamics, misprediction in the forecasting of PV generation, outdoor temperature, and user load demand make the problem more challenging. We solve this issue by approximating the optimal policy by a parameterized MPC scheme and updating the parameters via a compatible delayed deterministic actor-critic (with gradient Q-learning critic, i.e., CDDAC-GQ) algorithm. Simulation results show that the proposed MPC-based RL HEMS can effectively deliver a policy that satisfies both indoor thermal comfort and economic costs even in the case of inaccurate model and system uncertainties. Furthermore, we conduct a thorough comparison between the CDDAC-GQ algorithm and the conventional twin delayed deep deterministic policy gradient (TD3) algorithm, the results of which affirm the efficacy of our proposed method in addressing complex HEMS problems.INDEX TERMS Model predictive control (MPC), reinforcement learning (RL), home energy management system (HEMS), inaccurate model, system uncertainties. Nomenclature

show abstract

Multi-agent Battery Storage Management using MPC-based Reinforcement Learning

Cited by 9 publications

References 13 publications

Model Predictive Control-Based Reinforcement Learning Using Expected Sarsa

Model Predictive Control-Based Reinforcement Learning Using Expected Sarsa

Safe Reinforcement Learning Using Wasserstein Distributionally Robust MPC and Chance Constraint

A Learning-Based Model Predictive Control Strategy for Home Energy Management Systems

Contact Info

Product

Resources

About