Intelligent and energy-efficient heating, ventilation, and air conditioning (HVAC) system plays an important role in reducing energy consumption and protecting our environment. In this work, we focus on exploring suitable power optimization strategies using reinforcement learning (RL) without relying on human prior knowledge. A novel RL approach, multi-stabilization network(MADDPG-MSN) is proposed to tackle the sample-efficiency issue of current RL-based approaches for HVAC systems. Employing the multi-stabilization network trick, MADDPG-MSN efficiently learns to balance temperature control and power consumption with a limited number of interactions. Evaluated by the simulated data center scenario, it reduced 28% powerusage without compromising temperature control capability compared with the traditional model-predictive controller. In the real-world air conditioner testing, it demonstrated superior control performances than the built-in controller with 35% less power consumption and 21% smaller standard deviation of the indoor temperature after 72 hours’ learning. These results demonstrate the superior effectiveness and practicality of MADDPG-MSN in HVAC power consumption optimization, expanding the potential of RL as an emerging direction to more energy-saving systems.