Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting

Hsu, Chang-Hong; Zhang, Yunqi; Laurenzano, Michael A.; Meisner, David; Wenisch, Thomas F.; Mars, Jason; Tang, Lingjia; Dreslinski, Ronald G.

doi:10.1109/hpca.2015.7056039

Cited by 96 publications

(57 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many of these studies use workloads internal to datacenter operators like Google or Facebook [32,33,36,38,55,56]. Academic studies use one or a few latencycritical benchmarks [25,48,54], which limits the range of behaviors and performance requirements across which their proposed techniques can be evaluated. Some work uses more readily-available sequential and parallel batch workloads (e.g., from SPEC CPU2006 or PARSEC) and treats them as latencycritical applications [15,57].…”

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

“…These techniques include new cluster managers that schedule and migrate applications across systems to reduce interference [18,32,36,54], fast dynamic voltage-frequency scaling (DVFS) techniques to improve power efficiency [25,29,32,48], hardware and software schemes to use low power idle states [37,39,53], and hardware resource partitioning schemes that allow batch workloads to run alongside latency-critical ones, improving utilization [29,30,33,57].…”

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

“…Latency-critical applications have a wide variety of latency requirements and microarchitectural characteristics. However, most recent work in this area uses one or a few latencycritical applications in their evaluations [25,32,33,48], which do not stress a wide range of behaviors. Some prior work in this area even uses more readily-available sequential and parallel batch workloads (e.g., from SPEC CPU2006 or PAR-SEC [12]) and treats them as latency-critical applications [15,57].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Kasture

Sánchez

2016

2016 IEEE International Symposium on Workload Characterization (IISWC)

150

View full text Add to dashboard Cite

Abstract-Latency-critical applications, common in datacenters, must achieve small and predictable tail (e.g., 95 th or 99 th percentile) latencies. Their strict performance requirements limit utilization and efficiency in current datacenters. These problems have sparked research in hardware and software techniques that target tail latency. However, research in this area is hampered by the lack of a comprehensive suite of latency-critical benchmarks.We present TailBench, a benchmark suite and evaluation methodology that makes latency-critical workloads as easy to run and characterize as conventional, throughput-oriented ones. TailBench includes eight applications that span a wide range of latency requirements and domains, and a harness that implements a robust and statistically sound load-testing methodology. The modular design of the TailBench harness facilitates multiple load-testing scenarios, ranging from multi-node configurations that capture network overheads, to simplified single-node configurations that allow measuring tail latency in simulation. Validation results show that the simplified configurations are accurate for most applications. This flexibility enables rapid prototyping of hardware and software techniques for latency-critical workloads.

show abstract

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Kasture

Sánchez

2016

2016 IEEE International Symposium on Workload Characterization (IISWC)

150

View full text Add to dashboard Cite

show abstract

“…We observe that the impact of DVFS-only controls differs noticeably between Linux and IX: with Linux, the DVFSonly alternate frontier is very close to the Pareto frontier, meaning that a DVFS-only approach such as Pegasus [29] or Adrenaline [15] would be adequate. This is due to Linux's idling behavior, which saves resources.…”

Section: Pareto-optimal Static Configurationsmentioning

confidence: 91%

Energy proportionality and workload consolidation for latency-critical applications

Prekas

Primorac

Belay

et al. 2015

Proceedings of the Sixth ACM Symposium on Cloud Computing

View full text Add to dashboard Cite

Energy proportionality and workload consolidation are important objectives towards increasing efficiency in largescale datacenters. Our work focuses on achieving these goals in the presence of applications with µs-scale tail latency requirements. Such applications represent a growing subset of datacenter workloads and are typically deployed on dedicated servers, which is the simplest way to ensure low tail latency across all loads. Unfortunately, it also leads to low energy efficiency and low resource utilization during the frequent periods of medium or low load.We present the OS mechanisms and dynamic control needed to adjust core allocation and voltage/frequency settings based on the measured delays for latency-critical workloads. This allows for energy proportionality and frees the maximum amount of resources per server for other background applications, while respecting service-level objectives. Monitoring hardware queue depths allows us to detect increases in queuing latencies. Carefully coordinated adjustments to the NIC's packet redirection table enable us to reassign flow groups between the threads of a latency-critical application in milliseconds without dropping or reordering packets. We compare the efficiency of our solution to the Pareto-optimal frontier of 224 distinct static configurations. Dynamic resource control saves 44%-54% of processor energy, which corresponds to 85%-93% of the Pareto-optimal upper bound. Dynamic resource control also allows background jobs to run at 32%-46% of their standalone throughput, which corresponds to 82%-92% of the Pareto bound.

show abstract

“…The key challenge is coping with the inherent short-term variability of latency-critical applications: requests arrive at unpredictable times and are often bursty, causing short-term spikes and queuing delays that dominate tail latency [22,25]; and the amount of work per request often varies by an order of magnitude or more [16,25].…”

Section: Introductionmentioning

confidence: 99%

Rubik

Kasture

Bartolini

Beckmann

et al. 2015

Proceedings of the 48th International Symposium on Microarchitecture

117

View full text Add to dashboard Cite

Latency-critical workloads (e.g., web search), common in datacenters, require stable tail (e.g., 95th percentile) latencies of a few milliseconds. Servers running these workloads are kept lightly loaded to meet these stringent latency targets. This low utilization wastes billions of dollars in energy and equipment annually.Applying dynamic power management to latency-critical workloads is challenging. The fundamental issue is coping with their inherent short-term variability: requests arrive at unpredictable times and have variable lengths. Without knowledge of the future, prior techniques either adapt slowly and conservatively or rely on application-specific heuristics to maintain tail latency.We propose Rubik, a fine-grain DVFS scheme for latency-critical workloads. Rubik copes with variability through a novel, general, and efficient statistical performance model. This model allows Rubik to adjust frequencies at sub-millisecond granularity to save power while meeting the target tail latency. Rubik saves up to 66% of core power, widely outperforms prior techniques, and requires no application-specific tuning.Beyond saving core power, Rubik robustly adapts to sudden changes in load and system performance. We use this capability to design RubikColoc, a colocation scheme that uses Rubik to allow batch and latencycritical work to share hardware resources more aggressively than prior techniques. RubikColoc reduces datacenter power by up to 31% while using 41% fewer servers than a datacenter that segregates latency-critical and batch work, and achieves 100% core utilization.

show abstract

Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting

Cited by 96 publications

References 40 publications

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Energy proportionality and workload consolidation for latency-critical applications

Rubik

Contact Info

Product

Resources

About