2022
DOI: 10.48550/arxiv.2203.05434
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Near-optimal Deep Reinforcement Learning Policies from Data for Zone Temperature Control

Abstract: Replacing poorly performing existing controllers with smarter solutions will decrease the energy intensity of the building sector. Recently, controllers based on Deep Reinforcement Learning (DRL) have been shown to be more effective than conventional baselines. However, since the optimal solution is usually unknown, it is still unclear if DRL agents are attaining near-optimal performance in general or if there is still a large gap to bridge.In this paper, we investigate the performance of DRL agents compared t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…Most of the recent successes of RL were obtained through NN-based policies, which is then referred to as DRL. A plethora of algorithms to solve such (D)RL optimization problems have been developed in the past few years, and we rely on Twin Delayed Deep Deterministic policy gradients (TD3) [33] for our experiments [27].…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Most of the recent successes of RL were obtained through NN-based policies, which is then referred to as DRL. A plethora of algorithms to solve such (D)RL optimization problems have been developed in the past few years, and we rely on Twin Delayed Deep Deterministic policy gradients (TD3) [33] for our experiments [27].…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…The reward function was designed as r(x, u) = − max{T − b u , 0} − max{b l − T, 0} − λu, where T is the indoor temperature, b u and b l represent the current upper and lower temperature bounds, and λ balances the comfort of the occupants and the energy consumption. More details on the setup and simulation results can be found in [27], where DRL were shown to clearly outperform industrial baselines and attain near-optimal performance.…”
Section: Umarmentioning
confidence: 99%
See 1 more Smart Citation