2020
DOI: 10.1016/j.buildenv.2019.106535
|View full text |Cite
|
Sign up to set email alerts
|

Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
46
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 127 publications
(46 citation statements)
references
References 42 publications
0
46
0
Order By: Relevance
“…In model-based methods, DRL agents need to learn building environment models based on historical data, e.g., MuZero [45], Long Short-Term Memory-Deep Deterministic Policy Gradients (LSTM-DDPG) [46], differentiable MPC policy-Proximal Policy Optimization (differentiable MPC policy-when updating the weights , which can improve the stability of training process. Furthermore, the weights of the target network are updated every steps in Line 13, which makes the DQN algorithm more stable.…”
Section: Drl Classificationmentioning
confidence: 99%
See 4 more Smart Citations
“…In model-based methods, DRL agents need to learn building environment models based on historical data, e.g., MuZero [45], Long Short-Term Memory-Deep Deterministic Policy Gradients (LSTM-DDPG) [46], differentiable MPC policy-Proximal Policy Optimization (differentiable MPC policy-when updating the weights , which can improve the stability of training process. Furthermore, the weights of the target network are updated every steps in Line 13, which makes the DQN algorithm more stable.…”
Section: Drl Classificationmentioning
confidence: 99%
“…In model-based methods, DRL agents need to learn building environment models based on historical data Long Short-Term Memory-Deep Deterministic Policy Gradients (LSTM-DDPG) [46], differentiable mode policy-Proximal Policy Optimization (differentiable MPC policy-PPO) [47]. [63], Advantage [64], Asynchronous Advantage Actor-Critic (A3C) [65]), and maximum entropy methods (e.g., Multi-Ac (MAAC) [17], Entropy-Based Collective Advantage Actor-Critic (EB-C-A2C) [27], Entropy-Based Collecti (EB-C-DQN) [27]).…”
Section: Applications Of Drl In a Single Buildingmentioning
confidence: 99%
See 3 more Smart Citations