Job replication on multiserver systems

Kim, Yusik; Righter, Rhonda; Wolff, Ronald W.

doi:10.1017/s0001867800003414

Cited by 10 publications

(11 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For service time distributions that are 'new-worse-than-used' [15], it is shown in [16] that it is optimal to fork a job to maximum number of servers. The choice of scheduling policy for new-worse-than-used (NWU) and new-better-than-used (NBU) distributions is also studied in [17]- [19]. The NBU and NWU notions are closely related to the log-concavity of service time studied in this work.…”

Section: Cancel Early Keep Redundancy Keep Redundancymentioning

confidence: 99%

Efficient replication of queued tasks for latency reduction in cloud systems

Joshi

Soljanin²,

Wornell

2015

2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)

View full text Add to dashboard Cite

Abstract-In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always reduces service time, the total computing time spent per job may be higher, thus increasing waiting time in queue. The total time spent per job is also proportional to the cost of computing resources. We analyze how different redundancy strategies, for eg. number of replicas, and the time when they are issued and canceled, affect the latency and computing cost. We get the insight that the log-concavity of the service time distribution is a key factor in determining whether adding redundancy reduces latency and cost. If the service distribution is log-convex, then adding maximum redundancy reduces both latency and cost. And if it is log-concave, then having fewer replicas and canceling the redundant requests early is more effective.

show abstract

Section: Cancel Early Keep Redundancy Keep Redundancymentioning

confidence: 99%

Efficient replication of queued tasks for latency reduction in cloud systems

Joshi

Soljanin²,

Wornell

2015

2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)

View full text Add to dashboard Cite

show abstract

“…No replication is shown to be optimal for two servers and NBU job size distributions. In [4] these results are generalized and it is proved that no replication and full replication give the largest stability region for NBU and NWU job sizes, respectively. In [3] these results are extended to log-concave and respectively, log-convex complementary cumulative distribution functions.…”

Section: Introductionmentioning

confidence: 95%

Achievable Stability in Redundancy Systems

Raaijmakers

Borst

2021

Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

View full text Add to dashboard Cite

We investigate the achievable stability region for redundancy systems and a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we establish for New-Better-than-Used (NBU) distributed speed variations that no replication gives a strictly larger stability region than replication. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication gives a larger stability region than no replication. CCS CONCEPTS• Computer systems organization → Redundancy.

show abstract

“…For service time distributions that are 'new-worse-than-used' [Cao and Wang 1991], it is shown in [Koole and Righter 2008] that it is optimal to replicate a job at all servers in the system. The choice of scheduling policy for new-worse-than-used (NWU) and new-better-than-used (NBU) distributions is studied in [Kim et al 2009;Shah et al 2013;Sun et al 2015]. The NBU and NWU notions are closely related to the log-concavity of service time studied in this work.The Cost of Redundancy: If we assume exponential service time then redundancy does not cause any increase in cost of server time.…”

mentioning

confidence: 99%

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

Joshi

Soljanin

Wornell

2017

ACM Trans. Model. Perform. Eval. Comput. Syst.

122

107

View full text Add to dashboard Cite

In computing-as-a-service frameworks, the computing cost is proportional to money spent on renting machines to run a job on the cloud 2 . PREVIOUS WORK AND MAIN CONTRIBUTIONS Related Previous WorkSystems Work: The use of redundancy to reduce latency is not new. One of the earliest instances is the use of multiple routing paths [Maxemchuk 1975] to send packets in networks; see [Kabatiansky et al. 2005, Chapter 7] for a detailed survey of other related work. A similar idea has been studied [Vulimiri et al. 2013] in the context of DNS queries. In large-scale cloud computing frameworks, several recent works in systems [Dean and Ghemawat 2008;Ananthanarayanan et al. 2013;Ousterhout et al. 2013] explore straggler mitigation techniques where redundant replicas of straggling tasks are launched to reduce latency. Although the use of redundancy has been explored in systems literature, there is little work on the rigorous analysis of how it affects latency, and in particular the cost of resources. Next we review some of that work.Exponential Service Time: The (n, k) fork-join system was first proposed in [Joshi et al. 2012;Joshi et al. 2014] to analyze content download latency from erasure coded distributed storage. These works consider that a content file coded into n chunks can be recovered by accessing any k out of the n chunks, where the service time X of each chunk is exponential. Even with the exponential assumption analyzing the (n, k) fork-join system is a hard problem. It is a generalization of the (n, n) fork-join system, which was actively studied in queueing literature [Flatto and Hahn 1984;Nelson and Tantawi 1988;Varki et al. 2008] around two decades ago.Recently, an analysis of latency with heterogeneous job classes for the replicated (k = 1) case with distributed queues is presented in [Gardner et al. 2015]. Other related works include [Shah et al. 2014;Kumar et al. 2014;Xiang et al. 2014;Kadhe et al. 2015]. A common thread in all these works is that they also assume exponential service time.General Service Time: Few practical systems have exponentially distributed service time. For example, studies of download time traces from Amazon S3 Chen et al. 2014] indicate that the service time is not exponential in practice, but instead a shifted exponential. For service time distributions that are 'new-worse-than-used' [Cao and Wang 1991], it is shown in [Koole and Righter 2008] that it is optimal to replicate a job at all servers in the system. The choice of scheduling policy for new-worse-than-used (NWU) and new-better-than-used (NBU) distributions is studied in [Kim et al. 2009;Shah et al. 2013;Sun et al. 2015]. The NBU and NWU notions are closely related to the log-concavity of service time studied in this work.The Cost of Redundancy: If we assume exponential service time then redundancy does not cause any increase in cost of server time. But since this is not true in practice, it is important to determine the cost of using redundancy. Simulation results with non-zero fixed cost of removal of redundant requests...

show abstract

Job replication on multiserver systems

Cited by 10 publications

References 6 publications

Efficient replication of queued tasks for latency reduction in cloud systems

Efficient replication of queued tasks for latency reduction in cloud systems

Achievable Stability in Redundancy Systems

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

Contact Info

Product

Resources

About