Modern ARM-based servers such as ThunderX and Thun-derX2 offer a tremendous amount of parallelism by providing dozens or even hundreds of processors. However, exploiting these computing resources for reuse-heavy, data dependent workloads is a big challenge because of shared cache resources. In particular, schedulers have to conservatively co-locate processes to avoid cache conflicts since miss penalties are detrimental and conservative co-location decisions lead to lower resource utilization.To address these challenges, in this paper we explore the utility of predictive analysis of applications' execution to dynamically forecast resource-heavy workload regions, and to improve the efficiency of resource management through the use of new proactive methods. Our approach relies on the compiler to insert "beacons" in the application at strategic program points to periodically produce and/or update the attributes of anticipated resource-intense program region(s). The compiler classifies loops in programs based on predictability of their execution time and inserts different types of beacons at their entry/exit points. The precision of the information carried by beacons varies as per the analyzability of the loops, and the scheduler uses performance counters to fine tune co-location decisions. The information produced by beacons in multiple processes is aggregated and analyzed by the proactive scheduler to respond to the anticipated workload requirements. For throughput environments, we develop a framework that demonstrates high-quality predictions and improvements in throughput over CFS by 1.4x on an average and up to 4.7x on ThunderX and 1.9x on an average and up to 5.2x on ThunderX2 servers on consolidated workloads across 45 benchmarks.
No abstract
With a growing number of cores per socket in modern datacenters where multi-tenancy of a diverse set of applications must be efficiently supported, effective sharing of the last level cache is a very important problem. This is challenging because modern workloads exhibit dynamic phase behaviour -their cache requirements & sensitivity vary across different execution points. To tackle this problem, we propose Com-CAS, a compiler guided cache apportioning system that provides smart cache allocation to co-executing applications in a system. The front-end of Com-CAS is primarily a compilerframework equipped with learning mechanisms to predict cache requirements, while the backend consists of allocation framework with pro-active scheduler that apportions cache dynamically to co-executing applications. Our system improved average throughput by 21%, with a maximum of 54% while maintaining the worst individual application execution time degradation within 15% to meet SLA requirements.
In spite of tremendous advances in data dependence and dataflow analysis techniques, state-of-the-art optimizing compilers continue to suffer from imprecisions and miss potential optimization opportunities. These imprecisions result from statically unknown
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.