Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications.
In this work, we show that
applications’ resource requirements and execution phase behaviour can be captured
in a scalable and lightweight manner at runtime by estimating important program artifacts termed as “
dynamic loop characteristics
”. Specifically,
we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation
.
We present
Beacons Framework
, an end-to-end compiler and scheduling framework, that
estimates
dynamic loop characteristics,
encapsulates
them in compiler-instrumented
beacons
in an application, and
broadcasts
them during application runtime, for proactive workload scheduling.
We focus on estimating four important loop characteristics
:
loop trip-count
,
loop timing
,
loop memory footprint
, and
loop data-reuse behaviour
, through a combination of compiler analysis and machine learning.
The novelty of the Beacons Framework also lies in its ability to tackle
irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls
, which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a
proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing
. Our results show that Beacons Framework can predict different loop characteristics with an accuracy of
85%
to
95%
on average, and the proactive scheduler obtains an average throughput improvement of
1.9x
(up to
3.2x
) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.