Smart buildings are rapidly becoming more prevalent, aiming to create energy-efficient and comfortable living spaces. Nevertheless, the design of a smart building is a multifaceted approach that faces numerous challenges, with the primary one being the algorithm needed for energy management. In this paper, the design of a smart building, with a particular emphasis on the algorithm for controlling the indoor environment, is addressed. The implementation and evaluation of the Advantage-Weighted Actor-Critic algorithm is examined in a four-unit residential simulated building. Moreover, a novel self-adapted Advantage-Weighted Actor-Critic algorithm is proposed, tested, and evaluated in both the simulated and real building. The results underscore the effectiveness of the proposed control strategy compared to Rule-Based Controllers, Deep Deterministic Policy Gradient, and Advantage-Weighted Actor-Critic. Experimental results demonstrate a 34.91% improvement compared to the Deep Deterministic Policy Gradient and a 2.50% increase compared to the best Advantage-Weighted Actor-Critic method in the first epoch during a real-life scenario. These findings solidify the Self-Adapted Advantage-Weighted Actor-Critic algorithm’s efficacy, positioning it as a promising and advanced solution in the realm of smart building optimization.