“…This suits the practical interest especially when a detailed model is not available. In recent years, some reinforcement learning algorithms such as Deep Q Network (Van Le et al, 2019), Trust Region Policy Optimisation (Moriyama et al, 2018), Deep Deterministic Policy Gradient (Chi et al, 2020;Li, Wen, Tao, & Guan, 2019) have been already studied and applied to the energy-saving and reliable control of the data centre cooling system and achieved good energy-saving performance (Duan et al, 2020;Kumar, Khatri, & Diván, 2020;Linder, Van Gilder, Zhang, & Barrett, 2019;Liu, Wong, Ye, & Ma, 2020;Thein, Myo, Parvin, & Gawanmeh, 2020;Yang, Wang, He, Sun, & Zhang, 2019). Despite some considerable advantages of model-free reinforcement learning algorithms as aforementioned, the implementation in practice still faces many challenges.…”