As the HPC community moves into the exascale computing era, application energy is becoming as large of a concern as performance. Optimizing for energy will be essential in the effort to overcome the limited power envelope. Existing efforts to optimize energy in applications employ Dynamic Frequency and Voltage Scaling (DVFS) to maximize energy savings in less compute-intensive regions or non-critical execution paths. However, we found that DVFS has high power state switching overhead, preventing its use when a more fine-grained technique is necessary.In this work, we take advantage of the low transition overhead of CPU clock modulation and apply it to fine-grained OpenMP parallel loops. The energy behavior of OpenMP parallel regions is first characterized by changing the effective frequency using clock modulation. The clock modulation setting that achieves the best energy efficiency is then determined for each region. Finally, different CPU clock modulation settings are applied to the different loops within the same application. The resulting multifrequency execution of OpenMP applications achieves better energy-delay trade-off than any single frequency setting. In the best case scenario, the multi-frequency approach achieved 8.6% energy savings with less than 1.5% execution time increase. Concurrency throttling (i.e., reducing the number of hardware threads used by an application) saves more energy and can be combined with CPU clock modulation. Using both, we see savings of 21% energy and improvement of energy-delay product (EDP) by 16%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.