Being on the verge of exascale performance has shifted the prioritization of performance in applications to the inclusion of power-performance efficiency as a primary objective in the High Performance Computing (HPC) community. Simultaneously, this has surfaced hardware and software efforts that employ techniques such as dynamic voltage and frequency scaling (DVFS) for core and uncore units or dynamic concurrency throttling (DCT) to exploit hardware resources efficiently, by saving energy while maintaining performance. These techniques are complementary, so they can be used together. However, employing them is not a straightforward task, as they have to be adjusted based on the workload, and it is even more complex to combine them properly. Thus, these techniques should be applied transparently by a runtime system, without relying on application developers. In this paper, we extend a task-based runtime system with an infrastructure that categorizes workloads based on their computational profile -memory-bounded, compute-bounded, or balanced. This categorization is done in an on-line manner and with a negligible overhead. With this additional information, we enhance the CPU-manager and scheduler of OmpSs-2, a taskbased parallel programming model, to automatically combine DVFS and DCT techniques based on workloads. Moreover, we show that our heuristics transparently improve energy efficiency on average by 15% with no significant performance loss and either equal or surpass the energy efficiency of the best static configuration available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.