Scheduling Parallel Computations by Work Stealing: A Survey

Yang, Jixiang; He, Qinming

doi:10.1007/s10766-016-0484-8

Cited by 37 publications

(20 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In WS, independent scheduling agents manage resources in a parallel system, and may take roles of: (i) thieves, which attempt to dynamically remap work to underloaded (or idle) resources, trying to manage workload so tasks are constantly available to be computed; or (ii) victims, which are targets chosen by thieves to have their tasks stolen. WS schedulers are commonly applied to dynamic and imbalanced applications [9,35] that cannot afford a stable work decomposition, but may be applied to any parallel application decomposable as a Direct Acyclic Graph (DAG). This way, applications decomposed in models like fork/join [36], general task parallelism, and parallel loops in shared memory [37] have also benefited from WS.…”

Section: Work Stealing Schedulersmentioning

confidence: 99%

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

Freitas

Pilla

Santana

et al. 2021

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

The scalability of high-performance, parallel iterative applications is directly affected by how well they use the available computing resources. These applications are subject to load imbalance due to the nature and dynamics of their computations. It is common that high performance systems employ periodic load balancing to tackle this issue. Dynamic load balancing algorithms redistribute the application's workload using heuristics to circumvent the NP-hard complexity of the problem However, scheduling heuristics must be fast to avoid hindering application performance when distributing the workload on large and distributed environments. In this work, we present a technique for low overhead, high quality scheduling decisions for parallel iterative applications. The technique relies on combined application workload information paired with distributed scheduling algorithms. An initial distributed step among scheduling agents group application tasks in packs of similar load to minimize messages among them. This information is used by our scheduling algorithm, Pack-StealLB, for its distributed-memory work stealing heuristic. Experimental results showed that PackStealLB is able to improve the performance of a molecular dynamics benchmark by up to 41%, outperforming other scheduling algorithms in most scenarios over almost one thousand cores.

show abstract

Section: Work Stealing Schedulersmentioning

confidence: 99%

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

Freitas

Pilla

Santana

et al. 2021

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

show abstract

“…Dynamic Load Balancing. Dynamic load balancing strategies use work-stealing or work-shedding to redistribute load from heavily-loaded workers to lightly-loaded ones at runtime [12,13,17,34,82]. They are mostly developed for traditional task-based parallel programming models in multicore or HPC systems.…”

Section: Eliminating Stragglers By Load Balancingmentioning

confidence: 99%

“…Yet, different from those traditional workloads such as HPC processing [17,34] or MapReduce [31], ML computations are highly structured (with thousands of short iterations) and also tensor-based (samples in each iteration are packaged into a non-divisible matrix for fast processing). Static load balancing approaches [51,73,78] are not aware of runtime resource variations, and dynamic approaches [13,41,82], which are mainly based on work stealing [12,17,34], are also deficient for ML workloads. First, work stealing usually requires fine-grained worker progress monitoring and runtime load migration, which is inefficient for an iterative model training process.…”

Section: Introductionmentioning

confidence: 99%

Semi-dynamic load balancing

Chen

Weng

Wang

et al. 2020

Proceedings of the 11th ACM Symposium on Cloud Computing

View full text Add to dashboard Cite

Machine learning (ML) models are increasingly trained in clusters with non-dedicated workers possessing heterogeneous resources. In such scenarios, model training efficiency can be negatively affected by stragglers-workers that run much slower than others. Efficient model training requires eliminating such stragglers, yet for modern ML workloads, existing load balancing strategies are inefficient and even infeasible. In this paper, we propose a novel strategy called semi-dynamic load balancing to eliminate stragglers of distributed ML workloads. The key insight is that ML workers shall be load-balanced at iteration boundaries, being nonintrusive to intra-iteration execution. We develop LB-BSP based on such an insight, which is an integrated worker coordination mechanism that adapts workers' load to their instantaneous processing capabilities by right-sizing the sample batches at the synchronization barriers. We have customdesigned the batch sizing algorithm respectively for CPU and GPU clusters based on their own characteristics. LB-BSP has been implemented as a Python module for ML frameworks like TensorFlow and PyTorch. Our EC2 deployment confirms that LB-BSP is practical, effective and lightweight , and is able to accelerating distributed training by up to 54%.

show abstract

“…Additionally, since this is a pull-based scheduler, a stop criteria must be added as a maximum number of requests in line 5 of Algorithm 3. This kind of criteria is important to guarantee that the strategy will finish in a timely fashion, and not get into request-donation (or stealing, in Work-Stealing schedulers) cycles [24].…”

Section: Distributed Edge Migrationmentioning

confidence: 99%

Distributed Memory Graph Representation for Load Balancing Data: Accelerating Data Structure Generation for Decentralized Scheduling

Freitas

Santana

Castro

et al. 2019

2019 International Conference on High Performance Computing &Amp; Simulation (HPCS)

View full text Add to dashboard Cite

In this paper, we propose a Distributed Graph Model (DGM) and data structure to enable communicationaware heuristics in distributed load balancers (LBs). DGM is motivated by the desire to maintain and use information related to the affinity between tasks (their communication) in order to improve data locality while scheduling tasks in a distributed fashion to avoid the centralization overhead. Results show that DGM is able to achieve speedups of up to 50.4x with 40 virtual cores, when compared to a centralized graph representation with the same purpose. Additionally, we propose a proofof-concept distributed scheduler that uses DGM, named Edge Migration, and its implementation in the Charm++ parallel programming model. These results show that, although the communication analysis is much faster with DGM, it is still the most relevant overhead in distributed LBs. We also observe that Edge Migration has a decision time in the same order of magnitude as other communication-unaware decentralized algorithms. Thus, DGM can be used in communication-aware distributed LBs to improve load balancing decisions with a small impact in the overall LB performance.

show abstract

Scheduling Parallel Computations by Work Stealing: A Survey

Cited by 37 publications

References 76 publications

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

Semi-dynamic load balancing

Distributed Memory Graph Representation for Load Balancing Data: Accelerating Data Structure Generation for Decentralized Scheduling

Contact Info

Product

Resources

About