This paper presents a model predictive control (MPC)-based reinforcement learning (RL) approach for a home energy management system (HEMS). The house consists of an air-to-water heat pump connected to a hot water tank that supplies thermal energy to a water-based floor heating system. Additionally, it includes a photovoltaic (PV) array and a battery storage system. The HEMS is supposed to exploit the house thermal inertia and battery storage to shift demand from peak hours to off-peak periods and earn benefits by selling excess energy to the utility grid during periods of high electricity prices. However, designing such a HEMS is challenging because the discrepancies due to model mismatch make erroneous predictions of the system dynamics, leading to a non-optimal decision making. Besides, uncertainties in the house thermodynamics, misprediction in the forecasting of PV generation, outdoor temperature, and user load demand make the problem more challenging. We solve this issue by approximating the optimal policy by a parameterized MPC scheme and updating the parameters via a compatible delayed deterministic actor-critic (with gradient Q-learning critic, i.e., CDDAC-GQ) algorithm. Simulation results show that the proposed MPC-based RL HEMS can effectively deliver a policy that satisfies both indoor thermal comfort and economic costs even in the case of inaccurate model and system uncertainties. Furthermore, we conduct a thorough comparison between the CDDAC-GQ algorithm and the conventional twin delayed deep deterministic policy gradient (TD3) algorithm, the results of which affirm the efficacy of our proposed method in addressing complex HEMS problems.INDEX TERMS Model predictive control (MPC), reinforcement learning (RL), home energy management system (HEMS), inaccurate model, system uncertainties.
Nomenclature