Redundancy scheduling has emerged as a powerful strategy for improving response times in parallel-server systems. The key feature in redundancy scheduling is replication of a job upon arrival by dispatching replicas to different servers. Redundant copies are abandoned as soon as the first of these replicas finishes service. By creating multiple service opportunities, redundancy scheduling increases the chance of a fast response from a server that is quick to provide service, and mitigates the risk of a long delay incurred when a single selected server turns out to be slow.The diversity enabled by redundant requests has been found to strongly improve the response time performance, especially in case of highly variable service requirements. Analytical results for redundancy scheduling are unfortunately scarce however, and even the stability condition has largely remained elusive so far, except for exponentially distributed service requirements. In order to gain further insight in the role of the service requirement distribution, we explore the behavior of redundancy scheduling for scaled Bernoulli service requirements. We establish a sufficient stability condition for generally distributed service requirements and we show that, for scaled Bernoulli service requirements, this condition is also asymptotically nearly necessary. This stability condition differs drastically from the exponential case, indicating that the stability condition depends on the service requirements in a sensitive and intricate manner.
We consider job dispatching in systems with N parallel servers, where jobs arrive according to a Poisson process of rate λ. In redundancy-d policies, replicas of an arriving job are assigned to d≤N servers selected uniformly at random (without replacement) with the objective to reduce the delay. We introduce a quite general workload model, in which job sizes have some probability distribution while the speeds (slowdown factors) of the various servers for a given job are allowed to be inter-dependent and non-identically distributed. This allows not only for inherent speed differences among different servers, but also for affinity relations. We further propose two novel redundancy policies, so-called deltaprobe- d policies, where d probes of a fixed, small, size are created for each incoming job, and assigned to d servers selected uniformly at random. As soon as the first of these d probe tasks finishes, the actual job is assigned for execution with the same speed - to the corresponding server and the other probe tasks are abandoned. We also consider a delta-probe-d policy in which the probes receive preemptiveresume priority over regular jobs. The aim of these policies is to retain the benefits of redundancy-d policies while accounting for systematic speed differences and mitigating the risks of running replicas of the full job simultaneously for long periods of time.
We investigate the achievable stability region for redundancy systems and a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we establish for New-Better-than-Used (NBU) distributed speed variations that no replication gives a strictly larger stability region than replication. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication gives a larger stability region than no replication. CCS CONCEPTS• Computer systems organization → Redundancy.
We consider a system with N~parallel servers where incoming jobs are immediately replicated to, say, d~servers. Each of the N servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication $(d=1)$ gives a strictly larger stability region than replication $(d>1)$. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication ($d=N$) gives a larger stability region than no replication $(d=1)$.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.