2013
DOI: 10.1145/2534169.2486028
|View full text |Cite
|
Sign up to set email alerts
|

Speeding up distributed request-response workflows

Abstract: Abstract-We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
67
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 57 publications
(68 citation statements)
references
References 18 publications
0
67
1
Order By: Relevance
“…The controller code developed in the simulator can be directly plugged into brownout-aware applications like RUBiS 3 and RUBBoS 4 . For the controller implementation, the adaptive PI controller in (14) was discretized with sample period h = 0.5 s using the method suggested in [3], and complemented by a tracking-based anti-windup solution. The parameter estimations that the feedback and feedforward schemes require (Ĝ P (0),Ĝ I (0),λ,α) are implemented as exponentially weighted moving averages according tô…”
Section: A the Simulatormentioning
confidence: 99%
See 1 more Smart Citation
“…The controller code developed in the simulator can be directly plugged into brownout-aware applications like RUBiS 3 and RUBBoS 4 . For the controller implementation, the adaptive PI controller in (14) was discretized with sample period h = 0.5 s using the method suggested in [3], and complemented by a tracking-based anti-windup solution. The parameter estimations that the feedback and feedforward schemes require (Ĝ P (0),Ĝ I (0),λ,α) are implemented as exponentially weighted moving averages according tô…”
Section: A the Simulatormentioning
confidence: 99%
“…Admission control means that some users would not receive any response at all, hence risking losing them to competitors, incurring long-term revenue loss. Another possibility is to assign a maximum time to each request and iteratively refine an answer until the time budget expires [9,14]. This strategy works well for pruning search queries of spurious results, but does not easily generalize to all types of cloud applications.…”
Section: Introductionmentioning
confidence: 99%
“…A measurement interval seeks to capture the latency of a maximum number n of packets, where n is a constant. Our passive latency measurement is based on the coordinated measurement scheme proposed by Kompella et al [19]: (i) Average, captures the central tendency of latency, which characterizes the long-term latency trend [11]; (ii) Variance, measures how far the latencies are spread out, which correlates with the latency tail: the higher the variance, the worse the long-tail problem [17], [30]. Estimating other metrics like the order statistics such as the maximum delay or the quantiles requires knowledge of the latency value of each packet, unfortunately, the coordinated measurement scheme does not fulfill this requirement as it mixes the latency values of different packets.…”
Section: A Requirementsmentioning
confidence: 99%
“…We now discuss relevant characteristics of SOAs using a combination of measurements from a large cloud provider and prior reports on systems from other environments [8, 37,47,48,53,64,70].…”
Section: Soas In Productionmentioning
confidence: 99%
“…First, request execution in SOAs spans tens to hundreds of services, forming a DAG across the service topology [37]. The exact structure of the DAG is often unknown when the request first enters the system, since it depends on multiple factors like the APIs invoked at each encountered service, the supplied arguments, the content of caches, as well as the use of load balancing along the service graph.…”
Section: Introductionmentioning
confidence: 99%