Increasing workload conditions lead to a significant surge in power consumption and computing node failures in data centers. The existing workload distribution strategies focused on either thermal awareness or failure mitigation, overlooking the impact of node failures on the energy efficiency of cloud data centers. To address this issue, a new holistic model is built to characterize the impacts of workloads, computing and cooling costs, heat recirculation, and node failure on the energy efficiency of cloud data centers. Leveraging such a holistic model, we propose a novel thermal-aware workload distribution strategy called HGSA that takes node failure into accountand can improve the energy efficiency of cloud data centers. Our empirical findings confirm that (i) faulty nodes lead to a large rise in power consumption, and (ii) failure locations play a vital role in the power consumption of data centers. Experimental results unveil that HGSA is adroit at making near-optimal decisions in workload distribution strategies. In particular, HGSA cuts down the minimum inlet temperature by 5.2%-15%, improves the maximum air temperature of a Computer Room Air Conditioner (CRAC) model by 4.2%-26.5%, lowers the cooling cost by 15.4%-50% compared to the existing solutions. Furthermore, HGSA cuts back the total power consumption by 0.65%-78%.