Microprocessor voltage levels include substantial margin to deal with process variation, system power supply variation, workload induced thermal and voltage variation, aging, random uncertainty, and test inaccuracy. This margin allows the microprocessor to operate correctly during worst-case conditions, but during typical conditions it is larger than necessary and wastes energy. We present a mechanism that reduces excess voltage margin by (1) introducing a critical path monitor (CPM) circuit that measures available timing margin in real-time, (2) coupling the CPM output to the clock generation circuit to adjust clock frequency within cycles in response to excess or inadequate timing margin, and (3) adjusting the processor voltage level periodically in firmware to achieve a specified average clock frequency target. We implemented this mechanism in a prototype IBM POWER7 server. During better-than-worst case conditions our guardband management mechanism reduces the average voltage setting 137-152 mV below nominal, resulting in average processor power reduction of 24% with no performance loss while running industry-standard benchmarks.
A large portion of the power consumption of data centers can be attributed to cooling. In dynamic thermal management mechanisms for data centers and servers, thermal setpoints are typically chosen statically and conservatively, which leaves significant room for improvement in the form of improved energy efficiency. In this paper, we propose two hierarchical thermal-aware power optimization techniques that are complementary to each other and achieve (i) lower overall system power with no performance penalty or (ii) higher performance within the same power budget.At the data center level, we trade off facility Heating, Ventilation and Air Conditioning (HVAC) power with server fan power by choosing between two thermal setpoints for the HVAC chiller based on the cooling zone utilization levels. This optimization can reduce total data center total power by as much as 12.4%-17%, with no performance penalty.At the server level, we trade off fan power and circuit leakage power by dynamically adjusting the server thermal setpoint, allowing the system to heat up when this saves more fan power than it costs in terms of leakage power. We evaluate this optimization on an IBM POWER 750 and find that it reduces total server power by up to 5.4% with no performance penalty for workloads that heavily exercise a server.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.