Express-Lane Scheduling and Multithreading to Minimize the Tail Latency of Microservices

Mirhosseini, Amirhossein; West, Brendan L.; Blake, Geoffrey; Wenisch, Thomas F.

doi:10.1109/icac.2019.00031

Cited by 9 publications

(2 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shinjuku [24] seeks to address this challenge via implementing a highly efficient preemption mechanism to enable processor sharing by eliminating the operating system threading overheads. RPCValet [12], Nebula [43], and Q-Zilla [35,36] make the observation that shared request queues are very costly for s-scale microservices despite being imperative for achieving minimal tail latency. They seek to enable shared queues through specialized hardware support.…”

Section: Related Workmentioning

confidence: 99%

μSteal

Mirhosseini

Wenisch

2021

Proceedings of the ACM International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

Modern internet services are moving towards distributed microservice architectures, wherein a complex application is decomposed into numerous discrete microservices to improve programmability, reliability, manageability, and scalability. A key property of microservice-based architectures is that common microservices may be shared by multiple end-to-end cloud services. As an example, a speech-recognition microservice might serve as an early node in the microservice graphs of several end-to-end services. However, given the dissimilarities across microservice graphs and varying end-to-end latency constraints across services, shared microservices may need to operate under differing latency constraints for each service. As a result, in existing systems, most providers either deploy multiple instance pools for each latency constraint, or require all requests to needlessly meet the most stringent constraint.In this paper, we argue that sharing microservice instances across multiple services can reduce significantly the number of instances, especially under highly asymmetric latency constraints. We propose a request scheduling mechanism, called Steal, which leverages preemptive work and resource stealing to schedule the arriving requests to cores within a "mixed-criticality" microservice instance.Steal provisions "core reservations" for each request class based on their latency requirements, but allows a class to steal cores from other classes if they would otherwise remain idle. But, when a class requires its full reservation, Steal preempts stolen cores, returning them to their reserved class. Steal employs a runtime feedback controller augmented by a queuing theory-based analytical model to tune core reservations across classes, seeking to maximize the request throughput within each instance while meeting all classes' latency constraints. We show that Steal reduces required instances for several shared microservice deployments by 1.29× as compared to deploying multiple, segregated instance pools. CCS CONCEPTS• Computer systems organizations → Multi-core architectures; • Network → Cloud computing.

show abstract

Section: Related Workmentioning

confidence: 99%

μSteal

Mirhosseini

Wenisch

2021

Proceedings of the ACM International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Persephone [31] leveraged application-specific knowledge to reserve cores for short requests and avoid preemption altogether. Other proposals employ custom hardware with centralized scheduling (Mind the Gap [44], nanoPU [45], RPCValet [29]), priority queues (ExpressLane [54]), or fast context switching [43].…”

Section: User-level Threadingmentioning

confidence: 99%

Adaptive Task-Space Synchronization Control of SMMS Teleoperaiton Systems with Round-Robin Scheduling Protocol

Yin

et al. 2019

2019 Chinese Control Conference (CCC)

View full text Add to dashboard Cite

Modern cloud applications are prone to high tail latencies since their requests typically follow highly-dispersive distributions. Prior work has proposed both OS-and systemlevel solutions to reduce tail latencies for microsecond-scale workloads through better scheduling. Unfortunately, existing approaches like customized dataplane OSes, require significant OS changes, experience scalability limitations, or do not reach the full performance capabilities hardware offers. We propose LibPreemptible, a preemptive user-level threading library that is flexible, lightweight, and scalable. LibPreemptible is based on three key techniques: 1) a fast and lightweight hardware mechanism for delivery of timed interrupts, 2) a general-purpose user-level scheduling interface, and 3) an API for users to express adaptive scheduling policies tailored to the needs of their applications. Compared to the prior state-of-the-art scheduling system Shinjuku, our system achieves significant tail latency and throughput improvements for various workloads without the need to modify the kernel. We also demonstrate the flexibility of LibPreemptible across scheduling policies for real applications experiencing varying load levels and characteristics.

show abstract