Reinforcement Learning Testbed for Power-Consumption Optimization

Moriyama, Takao; Magistris, Giovanni De; Tatsubori, Michiaki; Pham, Tu-Hoa; Munawar, Asim; Tachibana, Ryuki

doi:10.48550/arxiv.1808.10427

Cited by 2 publications

(6 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [27] and [10], the author proposes to use Deep Q-Network [25] to control HVAC systems. In [15], the author proposes a EnergyPlus based research environment for developing reinforcement learning approaches for data-center HVAC control. In [7], the author shows promising results of approximating the Model Predictive Controller using neural networks.…”

Section: Related Workmentioning

confidence: 99%

“…Since MFRL cannot deal with constraints, reward shaping [17] is required to combine both cost and constraints into a single reward signal through penalty. Following [15], we define our reward function as follows…”

Section: Reinforcement Learning For Building Hvac Control 41 Model-fr...mentioning

confidence: 99%

“…As a case study, we evaluate our model-based reinforcement learning approach on a two-room data center proposed in [15]. The testbed is based on OpenAI Gym [6] and EnergyPlus [8] and open sourced at https://github.com/IBM/rl-testbed-for-energyplus.…”

Section: Case Studymentioning

confidence: 99%

“…The target system contains two zones (east zone and west zone), where the thermal load is IT Equipment (ITE) such as servers as shown in Figure 3. Each zone has a dedicated HVAC system similar to Figure 1 with the following components: outdoor air system (OA System), variable volume fan (VAV Fan), direct evaporative cooler (DEC), indirect evaporative cooler (IEC), direct expansion cooling coil (DX CC) and chilled water cooling coil (CW CC) [15]. For each zone, the temperature for all the components are specified by a common setpoint.…”

Section: System Modelingmentioning

confidence: 99%

“…Recent success of deep learning has led to the development of several deep reinforcement learning (DRL) based approaches for HVAC scheduling [15,27,29]. These data-driven approaches learn an agent to schedule the HVAC system by interacting with the environment.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Building HVAC Scheduling Using Reinforcement Learning via Neural Network Based Model Approximation

Zhang

Kuppannagari

Kannan

et al. 2019

Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation

View full text Add to dashboard Cite

Buildings sector is one of the major consumers of energy in the United States. The buildings HVAC (Heating, Ventilation, and Air Conditioning) systems, whose functionality is to maintain thermal comfort and indoor air quality (IAQ), account for almost half of the energy consumed by the buildings. Thus, intelligent scheduling of the building HVAC system has the potential for tremendous energy and cost savings while ensuring that the control objectives (thermal comfort, air quality) are satisfied.Traditionally, rule-based and model-based approaches such as linear-quadratic regulator (LQR) have been used for scheduling HVAC. However, the system complexity of HVAC and the dynamism in the building environment limit the accuracy, efficiency and robustness of such methods. Recently, several works have focused on model-free deep reinforcement learning based techniques such as Deep Q-Network (DQN). Such methods require extensive interactions with the environment. Thus, they are impractical to implement in real systems due to low sample efficiency. Safetyaware exploration is another challenge in real systems since certain actions at particular states may result in catastrophic outcomes.To address these issues and challenges, we propose a modelbased reinforcement learning approach that learns the system dynamics using a neural network. Then, we adopt Model Predictive Control (MPC) using the learned system dynamics to perform control with random-sampling shooting method. To ensure safe exploration, we limit the actions within safe range and the maximum absolute change of actions according to prior knowledge. We evaluate our ideas through simulation using widely adopted EnergyPlus tool on a case study consisting of a two zone data-center. Experiments show that the average deviation of the trajectories sampled from the learned dynamics and the ground truth is below 20%. Compared with baseline approaches, we reduce the total energy consumption by 17.1% ∼ 21.8%. Compared with model-free reinforcement learning approach, we reduce the required number of training steps to converge by 10x.The energy consumption by buildings consist of 40% of the total energy and 70% of total electricity in the United States [18]. Of the total energy consumption of buildings, the Heating, Ventilation and Air-Conditioning (HVAC) system accounts for 50% while the rest is used for lighting, electrical appliances, electric vehicles, etc. The main objective of the HVAC system is to maintain the indoor temperature and air quality. An intelligent HVAC scheduling system will, additionally, save energy and cost while satisfying the objective. The HVAC system is a nonlinear system and has complex system dynamics with a large number of subsystems including chillers, boilers, heat pumps, pipes, ducts, fans, pumps and heat exchangers [11]. In this paper, we assume the combination of equipment to operate by the HVAC system is fixed and focus on how to set the temperature point for local controllers to reduce the energy arXiv:1910.05313v1 [eess.SY]

show abstract