Characterizing Microservice Dependency and Performance

Luo, Shutian; Xu, Huanle; Lu, Chengzhi; Ye, Kejiang; Xu, Guoyao; Zhang, Liping; Ding, Yu; He, Junjia; Xu, Cheng‐Zhong

doi:10.1145/3472883.3487003

Cited by 151 publications

(43 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…7.4. We tune the client to generate a wide range of request intensity, but focus on the lower end (approximately 5 − 20% processor utilization), which represents the typical operating range of servers running latency-critical applications [47,62,[91][92][93][94]. For Memcached, this load range corresponds to a range of 4K − 100K QPS (queries per second).…”

Section: Discussionmentioning

confidence: 99%

“…One widely used method to ensure that microservices, and hence overall applications, meet their performance target is to execute them on servers that have low average utilization (5-20%) [47,62,[91][92][93][94], leading to a busy/idle execution pattern [16,17,65] where cores are frequently idle. Ideally, each core should enter a low-power core C-state whenever it is idle, and the entire system should transition to a low-power package C-state whenever all cores are idle.…”

Section: Introductionmentioning

confidence: 99%

“…A seminal work by Google that discusses latency-critical applications states [62]: "Modern servers are not energy proportional: they operate at peak energy efficiency when they are fully utilized but have much lower efficiencies at lower utilizations". The utilization of servers running latency-critical applications is typically 5%-20% to meet target tail latency requirements, as reported by multiple works from industry and academia [62,[91][92][93][94]. For example, recently, Alibaba reported that the utilization of servers running latency-critical applications is typically 10% [94].…”

Section: Introductionmentioning

confidence: 99%

“…The utilization of servers running latency-critical applications is typically 5%-20% to meet target tail latency requirements, as reported by multiple works from industry and academia [62,[91][92][93][94]. For example, recently, Alibaba reported that the utilization of servers running latency-critical applications is typically 10% [94]. Therefore, to improve the energy proportionality of servers running latency-critical microservice-based applications, it is crucial to address the more inefficient servers' operating points, namely the low utilization, which is the focus of our study.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers

Antoniou¹,

Volos²,

Bartolini³

et al. 2022

Preprint

View full text Add to dashboard Cite

Modern user-facing applications deployed in datacenters use a distributed system architecture that exacerbates the latency requirements of their constituent microservices (30-250µs). Existing CPU power-saving techniques degrade the performance of these applications due to the long transition latency (order of 100µs) to wake up from a deep CPU idle state (C-state). For this reason, server vendors recommend only enabling shallow core C-states (e.g., CC1) for idle CPU cores, thus preventing the system from entering deep package Cstates (e.g., PC6) when all CPU cores are idle. This choice, however, impairs server energy proportionality since powerhungry resources (e.g., IOs, uncore, DRAM) remain active even when there is no active core to use them. As we show, it is common for all cores to be idle due to the low average utilization (e.g., 5 − 20%) of datacenter servers running user-facing applications.We propose to reap this opportunity with AgilePkgC (APC), a new package C-state architecture that improves the energy proportionality of server processors running latencycritical applications. APC implements PC1A (package C1 agile), a new deep package C-state that a system can enter once all cores are in a shallow C-state (i.e., CC1) and has a nanosecond-scale transition latency. PC1A is based on four key techniques. First, a hardware-based agile power management unit (APMU) rapidly detects when all cores enter a shallow core C-state (CC1) and trigger the system-level power savings control flow. Second, an IO Standby Mode (IOSM) that places IO interfaces (e.g., PCIe, DMI, UPI, DRAM) in shallow (nanosecond-scale transition latency) low-power modes. Third, a CLM Retention (CLMR) rapidly reduces the CLM (Cache-and-home-agent, Last-level-cache, and Mesh network-on-chip) domain's voltage to its retention level, drastically reducing its power consumption. Fourth, APC keeps all system PLLs active in PC1A to allow nanosecond-scale exit latency by avoiding PLLs' re-locking overhead.Combining these techniques enables significant power savings while requiring less than 200ns transition latency, >250× faster than existing deep package C-states (e.g., PC6), making PC1A practical for datacenter servers. Our evaluation using Intel Skylake-based server shows that APC reduces the energy consumption of Memcached by up to 41% (25% on average) with <0.1% performance degradation. APC provides similar benefits for other representative workloads.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers

Antoniou¹,

Volos²,

Bartolini³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…3 Exploratory questions: These questions focus on microservice-design features discussed in the literature [58,85,86,92]) that are completely missing from all or most of the testbeds. For example, cyclic dependencies within requestsi.e., service A calling service B which then calls A againoccur in Alibaba traces [61], but are only present in one of the testbeds. This mismatch led us to investigate if request-level cyclic dependencies occur in participants' organizations.…”

Section: Creating Interview Questionsmentioning

confidence: 99%

[SoK] Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices

Seshagiri¹,

Huye²,

Liu³

et al. 2022

Journal of Systems Research

View full text Add to dashboard Cite

Microservices have increasing been a popular way of designing and building large-scale distributed systems. The challenges in developing microservices-based distributed applications have given rise to much academic research. However, the benchmarks used in academia are far from real-world microservices-based applications. This paper fills this gap and proposes ways microservices benchmarks should evolve to be more realistic. The authors take the readers through the limitations of existing testbeds, interviews with industry participants focusing on understanding the distance between benchmarks and real-world microservice-based applications, and propose ways to improve existing testbeds. I believe that the paper has the potential to support and accelerate research in this important area.

show abstract

A holistic evaluation methodology for configuring production data centers

Wen

Zhang

Cheng

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Performance evaluation is the basis for choosing appropriate system-level configurations for large-scale data centers. While the change of a system-level configuration would impact lots of jobs in the data centers, traditional load-testing benchmarks are not sufficient to support the decision-making because they cannot accurately reproduce the complex behaviors of a large number of jobs. Therefore, we expect to further evaluate the system configuration based on the production environment. However, there are technical challenges, namely, the lack of a holistic evaluation method that can unite the evaluation results of various jobs, and the uninterruptable production environment that should not be affected by the evaluation procedure. To address these challenges, we propose a holistic performance evaluation methodology and design its implementation platform. We introduce a simple but powerful performance metric, ERU (effectiveness of resource usage), and combine the ERU of involved jobs into a summarized value to measure the effect of a configuration change. We validate our ERU metric by comparing it with the CPI (Cycle per Instruction) and QPS (query per second) metrics, deploy the platform to production data centers and demonstrate the effectiveness for measuring system-level configurations of both software (JVM compiler update) and hardware (NUMA on/off) to save 14.

show abstract

Characterizing Microservice Dependency and Performance

Cited by 151 publications

References 20 publications

AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers

AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers

[SoK] Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices

A holistic evaluation methodology for configuring production data centers

Contact Info

Product

Resources

About