Modern internet services are moving towards distributed microservice architectures, wherein a complex application is decomposed into numerous discrete microservices to improve programmability, reliability, manageability, and scalability. A key property of microservice-based architectures is that common microservices may be shared by multiple end-to-end cloud services. As an example, a speech-recognition microservice might serve as an early node in the microservice graphs of several end-to-end services. However, given the dissimilarities across microservice graphs and varying end-to-end latency constraints across services, shared microservices may need to operate under differing latency constraints for each service. As a result, in existing systems, most providers either deploy multiple instance pools for each latency constraint, or require all requests to needlessly meet the most stringent constraint.In this paper, we argue that sharing microservice instances across multiple services can reduce significantly the number of instances, especially under highly asymmetric latency constraints. We propose a request scheduling mechanism, called Steal, which leverages preemptive work and resource stealing to schedule the arriving requests to cores within a "mixed-criticality" microservice instance.Steal provisions "core reservations" for each request class based on their latency requirements, but allows a class to steal cores from other classes if they would otherwise remain idle. But, when a class requires its full reservation, Steal preempts stolen cores, returning them to their reserved class. Steal employs a runtime feedback controller augmented by a queuing theory-based analytical model to tune core reservations across classes, seeking to maximize the request throughput within each instance while meeting all classes' latency constraints. We show that Steal reduces required instances for several shared microservice deployments by 1.29× as compared to deploying multiple, segregated instance pools.
CCS CONCEPTS• Computer systems organizations → Multi-core architectures; • Network → Cloud computing.