Abstract
Background: Peak periods are a result of consumers generally using electricity at similar times and periods as each other, for example, turning lights on when returning home from work, or the widespread use of air conditioners during the summer. Without peak shifting, the grid’s system operators are forced to use peaked plants to provide the additional energy, the operation of which is incredibly expensive and dangerous to the environment due to their high levels of carbon emissions.Methods: Battery storage system (BSS) has been used to allow for the purchase the energy during off-peak periods for later use, with the primary objective of achieving peak shifting, is explored. In addition, reduction in energy consumption and the lowering of consumer’s utility bills are also sought, making this a multi-objective optimization problem. Reinforcement learning methods are implemented to provide a solution to this problem by finding an optimal control policy which defines when is best to purchase and store energy with the objectives in mind.Results: This achieves over a 20% reduction in energy consumption and consumers’ energy bills, as well as achieving perfect peak shifting, thereby removing peaking plants from the equation entirely. This result was obtained using a simulator that the author has developed specifically for this task, which handles the model training, testing, and evaluation process. Secondly, the development of a novel technique, automatic penalty shaping, was also found to be crucial to the success of the learned model. This technique enabled the automatic shaping of the reward signal, forcing the agent to pay equal attention to multiple individual signals, a necessity when applying reinforcement learning to multi-objective optimization problems. The policy does, however, attempt to overcharge the battery about 7% of the time, and promising methods to address this has been proposed as a direction for future research.Conclusion: The aim of this task was to verify that reinforcement learning is a suitable solution method to the peak demand problem. That is, can reinforcement learning be used in conjunction with a BSS to purchase energy at off-peak periods, in order to flatten the energy requirement profile of consumers. Such an achievement would prevent the grid’s system operators from needing to use peaked plants to provide additional energy during peak periods, lowering carbon emissions and energy prices for the consumer. This peak-shifting would allow the grid’s system operators to be able to more easily predict electricity demand, thereby reducing their need to generate more energy than necessary, again lowering the tariffs for energy for the consumer. Secondary aims of directly reducing the energy consumption and utility bills were also sought, making this a multi-objective optimization problem. The used data, in conjunction with the created simulator which performs the full training and testing phases of the models, to find an optimal policy using the deep Q-network (DQN) and Proximal Policy Optimization (PPO) reinforcement learning algorithms. Finally, the proposed algorithm is able to achieve perfect peak shifting, a reduction in the monthly utility bill by 21% and also a reduction in energy consumption by 23%, achieving all of the aims of the task.