Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on-and off-policy multi-objective reinforcement learning (RL) approach, that does not assume a model a priori, benchmarking this against a linear MPC (LMPC -to reflect current practice, though non-linear MPC performs better) -both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work.
Recently, multi-energy systems (MESs), whereby different energy carriers are coupled together, have become popular. For a more efficient use of MESs, the optimal operation of these systems needs to be considered. This paper focuses on the day-ahead optimal schedule of an MES, including a combined heat and electricity (CHP) unit, a gas boiler, a PV system, and energy storage devices. Starting from a day-ahead PV point forecast, a non-parametric probabilistic forecast method is proposed to build the predicted interval and represent the uncertainty of PV generation. Afterwards, the MES is modeled as mixed-integer linear programming (MILP), and the scheduling problem is solved by interval optimization. To demonstrate the effectiveness of the proposed method, a case study is performed on a real industrial MES. The simulation results show that, by using only historical PV measurement data, the point forecaster reaches a normalized root-mean square error (NRMSE) of 14.24%, and the calibration of probabilistic forecast is improved by 10% compared to building distributions around point forecast. Moreover, the results of interval optimization show that the uncertainty of the PV system not only has an influence on the electrical part of the MES, but also causes a shift in the behavior of the thermal system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.