Queueing with redundant requests: exact analysis

Gardner, Kristen; Zbarsky, Samuel; Doroudi, Sherwin; Harchol-Balter, Mor; Hyytiä, Esa; Scheller‐Wolf, Alan

doi:10.1007/s11134-016-9485-y

Cited by 114 publications

(113 citation statements)

References 37 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…This implies that each job is forked to the identical number of servers, and job is completed by joining identical number of service completions. Tight numerical bounds are provided in [6], analytical bounds are presented in [7], [15]- [17], analytical approximations appear in [18], exact analysis for small systems in [19], exact analysis for random independent scheduling for asymptotically large number of servers in [20], and an exact analysis of tail index for Pareto-distributed file sizes in [21].…”

Section: A Related Workmentioning

confidence: 99%

Optimal Server Selection for Straggler Mitigation

Badita

Parag

Aggarwal

2020

IEEE/ACM Trans. Networking

View full text Add to dashboard Cite

The performance of large-scale distributed compute systems is adversely impacted by stragglers when the execution time of a job is uncertain. To manage stragglers, we consider a multi-fork approach for job scheduling, where additional parallel servers are added at forking instants. In terms of the forking instants and the number of additional servers, we compute the job completion time and the cost of server utilization when the task processing times are assumed to have a shifted exponential distribution. We use this study to provide insights into the scheduling design of the forking instants and the associated number of additional servers to be started. Numerical results demonstrate orders of magnitude improvement in cost in the regime of low completion times as compared to the prior works.

show abstract

Section: A Related Workmentioning

confidence: 99%

Optimal Server Selection for Straggler Mitigation

Badita

Parag

Aggarwal

2020

IEEE/ACM Trans. Networking

View full text Add to dashboard Cite

show abstract

“…The sojourn time for the 'N'-system under the Redundancy Service policy is derived in Theorems 2 and 3 of [2]. From this we obtain the expected sojourn times for type 1 and type 2 customers:…”

Section: A Comparison Of Fcfs-alis and Redundancy Service For The 'N'mentioning

confidence: 99%

“…The customer and all its copies leave the system when the first of its copies completes service. This model was studied by Gardner et al [2]. -A Parallel FCFS Matching Queue: There is an arrival stream of customers of types C, and an independent arrival stream of servers of types S. When a customer arrives he joins a queue of customers waiting for service.…”

Section: Introductionmentioning

confidence: 99%

FCFS parallel service systems and matching models

Adan

Kleiner

Righter

et al. 2018

Performance Evaluation

View full text Add to dashboard Cite

We consider three parallel service models in which customers of several types are served by several types of servers subject to a bipartite compatibility graph, and the service policy is first come first served. Two of the models have a fixed set of servers. The first is a queueing model in which arriving customers are assigned to the longest idling compatible server if available, or else queue up in a single queue, and servers that become available pick the longest waiting compatible customer, as studied by Adan and Weiss, 2014. The second is a redundancy service model where arriving customers split into copies that queue up at all the compatible servers, and are served in each queue on FCFS basis, and leave the system when the first copy completes service, as studied by Gardner et al., 2016. The third model is a matching queueing model with a random stream of arriving servers. Arriving customers queue in a single queue and arriving servers match with the first compatible customer and leave immediately with the customer, or they leave without a customer. The last model is relevant to organ transplants, to housing assignments, to adoptions and many other situations.We study the relations between these models, and show that they are closely related to the FCFS infinite bipartite matching model, in which two infinite sequences of customers and servers of several types are matched FCFS according to a bipartite compatibility graph, as studied by Adan et al., 2017. We also introduce a directed bipartite matching model in which we embed the queueing systems. This leads to a generalization of Burke's theorem to parallel service systems.

show abstract

“…We omit the details due to space considerations; for the full proof, see the associated technical report [15].…”

Section: Proofs For N Modelmentioning

confidence: 99%

Reducing Latency via Redundant Requests

Gardner

Zbarsky

Doroudi

et al. 2015

Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Self Cite

View full text Add to dashboard Cite

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy.This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system.In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the "gain" to redundant classes and "pain" to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are "immune" to any pain caused by other classes becoming redundant.We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

show abstract

Queueing with redundant requests: exact analysis

Cited by 114 publications

References 37 publications

Optimal Server Selection for Straggler Mitigation

Optimal Server Selection for Straggler Mitigation

FCFS parallel service systems and matching models

Reducing Latency via Redundant Requests

Contact Info

Product

Resources

About