Efficient runtime task scheduling on complex memory hierarchy becomes increasingly important as modern and future High-Performance Computing (HPC) systems are progressively composed of multisocket and multi-chiplet nodes with nonuniform memory access latencies. Existing locality-aware scheduling schemes either require control of the data placement policy for memory-bound tasks or maximize locality for all classes of computations, resulting in a loss of potential performance. While such approaches are viable, an adaptive scheduling strategy is preferred to enhance locality and resource sharing efficiency using a portable programming scheme. In this paper, we propose the Adaptive Resource-Moldable Scheduler (ARMS) that dynamically maps a task at runtime to a partition spanning one or more threads, based on the task and DAG requirements. The scheduler builds an online platform-independent model for the local and non-local scheduling costs for each tuple consisting of task type (function) and task topology (task location within DAG). We evaluate ARMS using task-parallel versions of SparseLU, 2D Stencil, FMM and MatMul as examples. Compared to previous approaches, ARMS achieves up to 3.5× performance gain over state-of-the-art locality-aware scheduling schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.