In computing-as-a-service frameworks, the computing cost is proportional to money spent on renting machines to run a job on the cloud 2 .
PREVIOUS WORK AND MAIN CONTRIBUTIONS
Related Previous WorkSystems Work: The use of redundancy to reduce latency is not new. One of the earliest instances is the use of multiple routing paths [Maxemchuk 1975] to send packets in networks; see [Kabatiansky et al. 2005, Chapter 7] for a detailed survey of other related work. A similar idea has been studied [Vulimiri et al. 2013] in the context of DNS queries. In large-scale cloud computing frameworks, several recent works in systems [Dean and Ghemawat 2008;Ananthanarayanan et al. 2013;Ousterhout et al. 2013] explore straggler mitigation techniques where redundant replicas of straggling tasks are launched to reduce latency. Although the use of redundancy has been explored in systems literature, there is little work on the rigorous analysis of how it affects latency, and in particular the cost of resources. Next we review some of that work.Exponential Service Time: The (n, k) fork-join system was first proposed in [Joshi et al. 2012;Joshi et al. 2014] to analyze content download latency from erasure coded distributed storage. These works consider that a content file coded into n chunks can be recovered by accessing any k out of the n chunks, where the service time X of each chunk is exponential. Even with the exponential assumption analyzing the (n, k) fork-join system is a hard problem. It is a generalization of the (n, n) fork-join system, which was actively studied in queueing literature [Flatto and Hahn 1984;Nelson and Tantawi 1988;Varki et al. 2008] around two decades ago.Recently, an analysis of latency with heterogeneous job classes for the replicated (k = 1) case with distributed queues is presented in [Gardner et al. 2015]. Other related works include [Shah et al. 2014;Kumar et al. 2014;Xiang et al. 2014;Kadhe et al. 2015]. A common thread in all these works is that they also assume exponential service time.General Service Time: Few practical systems have exponentially distributed service time. For example, studies of download time traces from Amazon S3 Chen et al. 2014] indicate that the service time is not exponential in practice, but instead a shifted exponential. For service time distributions that are 'new-worse-than-used' [Cao and Wang 1991], it is shown in [Koole and Righter 2008] that it is optimal to replicate a job at all servers in the system. The choice of scheduling policy for new-worse-than-used (NWU) and new-better-than-used (NBU) distributions is studied in [Kim et al. 2009;Shah et al. 2013;Sun et al. 2015]. The NBU and NWU notions are closely related to the log-concavity of service time studied in this work.The Cost of Redundancy: If we assume exponential service time then redundancy does not cause any increase in cost of server time. But since this is not true in practice, it is important to determine the cost of using redundancy. Simulation results with non-zero fixed cost of removal of redundant requests...