Communications in datacenter jobs (such as the shuffle operations in MapReduce applications) often involve many parallel flows, which may be processed simultaneously. This highly parallel structure presents new scheduling challenges in optimizing job-level performance objectives in data centers.Chowdhury and Stoica [11] introduced the coflow abstraction to capture these communication patterns, and recently Chowdhury et al. [13] developed effective heuristics to schedule coflows. In this paper, we consider the problem of efficiently scheduling coflows with release dates so as to minimize the total weighted completion time, which has been shown to be strongly NP-hard [13]. Our main result is the first polynomial-time deterministic approximation algorithm for this problem, with an approximation ratio of 67/3, and a randomized version of the algorithm, with a ratio of 9 + 16 √ 2/3. Our results use techniques from both combinatorial scheduling and matching theory, and rely on a clever grouping of coflows. We also run experiments on a Facebook trace to test the practical performance of several algorithms, including our deterministic algorithm. Our experiments suggest that simple algorithms provide effective approximations of the optimal, and that our deterministic algorithm has near-optimal performance.
AcknowledgementsMany thanks to Yuan Zhong, Ion Stoica and Scott Shenker for making this thesis possible. Also thanks to David Zats, Shivaram Venkataraman, and Neeraja Yadwadkar for providing feedback on earlier versions of the text.Finally, a very special thanks to both my advisers, Scott Shenker and Ion Stoica, for guiding me through my ups and downs in life and putting up with my idiosyncrasies. 2 AbstractThe need for real-time processing of "big data" has led to the development of frameworks for distributed stream processing in clusters. It is important for such frameworks to be robust against variable operating conditions such as server failures, changes in data ingestion rates, and workload characteristics. To provide fault tolerance and efficient stream processing at scale, recent stream processing frameworks have proposed to treat streaming workloads as a series of batch jobs on small batches of streaming data. However, the robustness of such frameworks against variable operating conditions has not been explored.In this paper, we explore the effect of the size of batches on the performance of streaming workloads. The throughput and end-to-end latency of the system can have complicated relationships with batch sizes, data ingestion rates, variations in available resources, workload characteristics, etc. We propose a simple yet robust control algorithm that automatically adapts batch sizes as the situation necessitates. We show through extensive experiments that this algorithm is powerful enough to ensure system stability and low end-to-end latency for a wide class of workloads, despite large variations in data rates and operating conditions. 3
We consider a service system model primarily motivated by the problem of efficient assignment of virtual machines to physical host machines in a network cloud, so that the number of occupied hosts is minimized.There are multiple types of arriving customers, where a customer's mean service time depends on its type. There is an infinite number of servers. Multiple customers can be placed for service into one server, subject to general "packing" constraints. Service times of different customers are independent, even if served simultaneously by the same server. Each new arriving customer is placed for service immediately, either into a server already serving other customers (as long as packing constraints are not violated) or into an idle server. After a service completion, each customer leaves its server and the system.We propose an extremely simple and easily implementable customer placement algorithm, called Greedy-Random (GRAND). It places each arriving customer uniformly at random into either one of the already occupied servers (subject to packing constraints) or one of the so-called zero-servers, which are empty servers designated to be available to new arrivals. One instance of GRAND, called GRAND(aZ), where a ≥ 0 is a parameter, is such that the number of zero-servers at any given time t is aZ(t), where Z(t) is the current total number of customers in the system. We prove that GRAND(aZ) with a > 0 is asymptotically optimal, as the customer arrival rates grow to infinity and a → 0, in the sense of minimizing the total number of occupied servers in steady state. In addition, we study by simulations various versions of GRAND and observe the dependence of convergence speed and steadystate performance on the number of zero-servers.
We study the optimal scaling of the expected total queue size in an n × n input-queued switch, as a function of the number of ports n and the load factor ρ, which has been conjectured to be Θ(n/(1 − ρ)) (cf. [13]). In a recent work [14], the validity of this conjecture has been established for the regime where 1 − ρ = O(1/n 2 ). In this paper, we make further progress in the direction of this conjecture. We provide a new class of scheduling policies under which the expected total queue size scales asThis is an improvement over the state of the art; for example, for ρ = 1 − 1/n the best known bound was O(n 3 ), while ours is O(n 2.5 log n).
Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coflow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper , we address inter-coflow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coflows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16× faster on average and up to 2× more coflows meet their deadlines using Varys in comparison to per-flow mechanisms. Moreover, Varys outperforms non-preemptive coflow schedulers by more than 5×.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.