Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices

Mirhosseini, Amirhossein; West, Brendan L.; Blake, Geoffrey; Wenisch, Thomas F.

doi:10.1109/hpca47549.2020.00026

Cited by 19 publications

(4 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The scheduler, however, is limited to improving the latency of networking tasks only. Q‐Zilla 38 is a scheduling framework aimed at reducing queuing delays and thereby reducing tail latency. The queuing algorithm has also been implemented at the single‐node level to create a micro‐architecture (CoreZilla) that improves CPU access speeds.…”

Section: Related Workmentioning

confidence: 99%

Niyama: Node scheduling for cloud workloads with resource isolation

Thiyyakat

Kalambur

Sitaram³

2022

Concurrency and Computation

View full text Add to dashboard Cite

Cloud providers place tasks from multiple applications on the same resource pool to improve the resource utilization of the infrastructure. The consequent resource contention has an undesirable effect on latency-sensitive tasks. In this article, we present Niyama-a resource isolation approach that uses a modified version of deadline scheduling to protect latency-sensitive tasks from CPU bandwidth contention.Conventionally, deadline scheduling has been used to schedule real-time tasks with well-defined deadlines. Therefore, it cannot be used directly when the deadlines are unspecified. In Niyama, we estimate deadlines in intervals and secure bandwidth required for the interval, thereby ensuring optimal job response times. We compare our approach with cgroups: Linux's default resource isolation mechanism used in containers today. Our experiments show that Niyama reduces the average delay in tasks by 3×-20× when compared to cgroups. Since Linux's deadline scheduling policy is work-conserving in nature, there is a small drop in the server-level CPU utilization when Niyama is used naively. We demonstrate how the use of core reservation and oversubscription in the inter-node scheduler can be used to offset this drop; our experiments show a 1.3×-2.24× decrease in delay in job response time over cgroups while achieving high CPU utilization.

show abstract

Section: Related Workmentioning

confidence: 99%

Niyama: Node scheduling for cloud workloads with resource isolation

Thiyyakat

Kalambur

Sitaram³

2022

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…In this work, they consider the energy consumption of all computations and transmissions, ignoring the scarcity differences among various devices/servers, such as, the transmission energy of sensing devices is much scarcer than the computation energy of cloud servers. In addition, the optimization of accumulated sum may lead to an imbalance performance among tasks, such as the long tail latency [133].…”

Section: D: Multi-objective Optimizationmentioning

confidence: 99%

A Survey and Taxonomy on Task Offloading for Edge-Cloud Computing

et al. 2020

View full text Add to dashboard Cite

Edge-cloud computing, combining the benefits of both edge computing and cloud computing, is one of the most promising ways to address the resource insufficiency of smart devices. Task offloading is an important challenge must be addressed for edge-cloud computing in practice, which decides the place and the time for performing each task. Even though there is existing research focusing on the task offloading in edge-cloud computing, a lot of problems should be solved before the application of these offloading technologies. Thus, in this article, we first propose a taxonomy of task offloading in edge-cloud environments to investigate and classify related research articles, and then summarize several challenges which have not been addressed for future research directions on this area to promote the development of edge-cloud market.

show abstract

“…Shinjuku [24] seeks to address this challenge via implementing a highly efficient preemption mechanism to enable processor sharing by eliminating the operating system threading overheads. RPCValet [12], Nebula [43], and Q-Zilla [35,36] make the observation that shared request queues are very costly for s-scale microservices despite being imperative for achieving minimal tail latency. They seek to enable shared queues through specialized hardware support.…”

Section: Related Workmentioning

confidence: 99%

μSteal

Mirhosseini

Wenisch

2021

Proceedings of the ACM International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

Modern internet services are moving towards distributed microservice architectures, wherein a complex application is decomposed into numerous discrete microservices to improve programmability, reliability, manageability, and scalability. A key property of microservice-based architectures is that common microservices may be shared by multiple end-to-end cloud services. As an example, a speech-recognition microservice might serve as an early node in the microservice graphs of several end-to-end services. However, given the dissimilarities across microservice graphs and varying end-to-end latency constraints across services, shared microservices may need to operate under differing latency constraints for each service. As a result, in existing systems, most providers either deploy multiple instance pools for each latency constraint, or require all requests to needlessly meet the most stringent constraint.In this paper, we argue that sharing microservice instances across multiple services can reduce significantly the number of instances, especially under highly asymmetric latency constraints. We propose a request scheduling mechanism, called Steal, which leverages preemptive work and resource stealing to schedule the arriving requests to cores within a "mixed-criticality" microservice instance.Steal provisions "core reservations" for each request class based on their latency requirements, but allows a class to steal cores from other classes if they would otherwise remain idle. But, when a class requires its full reservation, Steal preempts stolen cores, returning them to their reserved class. Steal employs a runtime feedback controller augmented by a queuing theory-based analytical model to tune core reservations across classes, seeking to maximize the request throughput within each instance while meeting all classes' latency constraints. We show that Steal reduces required instances for several shared microservice deployments by 1.29× as compared to deploying multiple, segregated instance pools. CCS CONCEPTS• Computer systems organizations → Multi-core architectures; • Network → Cloud computing.

show abstract

Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices

Cited by 19 publications

References 58 publications

Niyama: Node scheduling for cloud workloads with resource isolation

Niyama: Node scheduling for cloud workloads with resource isolation

A Survey and Taxonomy on Task Offloading for Edge-Cloud Computing

μSteal

Contact Info

Product

Resources

About