Predictive, reactive and replication-based load balancing of tasks in Chameleon and sam(oa)
            <sup>2</sup>

Samfass, Philipp; Klinkenberg, Jannis; Chung, Minh Thanh; Bäder, Michael

doi:10.1145/3468267.3470574

Cited by 3 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because of reactive task migration, we do call "offloading" tasks instead of "stealing," and reactive action is taken from the overloaded/slow processes. 16 As shown in Figure 4B, the decision time of offloading tasks (t k ) is earlier than work stealing in Figure 4A. A detailed example of how tasks are offloaded reactively can be found in Appendix A.…”

Section: Reactive Load Balancingmentioning

confidence: 98%

“…11,15 The following idea is task replication that aims at tackling unexpected performance variability. 16 However, this is difficult to know how many tasks should be offloaded at once and which processes are truly underloaded/fast in a short period. Without prior load knowledge, replication strategies need to fix the target process for replicas, such as left/right neighbor ranks.…”

Section: Related Workmentioning

confidence: 99%

“…Instead of waiting for a process queue empty, the reactive approach relies on monitoring the queue status to offload tasks from an overloaded process to underloaded targets § 11,15 . The following idea is task replication that aims at tackling unexpected performance variability 16 . However, this is difficult to know how many tasks should be offloaded at once and which processes are truly underloaded/fast in a short period.…”

Section: Related Workmentioning

confidence: 99%

“…When the imbalance condition is met, tasks can be offloaded in advance. This is why we can reduce the impact of migration overhead better than work stealing 11,16 . However, reactive operations might be settled wrong in the cases of high imbalance because the most current status only reflects as well as conjectures an unbalanced situation for a short period.…”

Section: Introductionmentioning

confidence: 99%

“…This is why we can reduce the impact of migration overhead better than work stealing. 11,16 However, reactive operations might be settled wrong in the cases of high imbalance because the most current status only reflects as well as conjectures an unbalanced situation for a short period. We still lack information about load and which process is a potential victim to offload tasks.…”

mentioning

confidence: 99%

See 4 more Smart Citations

From reactive to proactive load balancing for task‐based parallel applications in distributed memory machines

Chung

Weidendorfer

Fürlinger

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryLoad balancing is often a challenge in task‐parallel applications. The balancing problems are divided into static and dynamic. “Static” means that we have some prior knowledge about load information and perform balancing before execution, while “dynamic” must rely on partial information of the execution status to balance the load at runtime. Conventionally, work stealing is a practical approach used in almost all shared memory systems. In distributed memory systems, the communication overhead can make stealing tasks too late. To improve, people have proposed a reactive approach to relax communication in balancing load. The approach leaves one dedicated thread per process to monitor the queue status and offload tasks reactively from a slow to a fast process. However, reactive decisions might be mistaken in high imbalance cases. First, this article proposes a performance model to analyze reactive balancing behaviors and understand the bound leading to incorrect decisions. Second, we introduce a proactive approach to improve further balancing tasks at runtime. The approach exploits task‐based programming models with a dedicated thread as well, namely . Nevertheless, the main idea is to force not only to monitor load; it will characterize tasks and train load prediction models by online learning. “Proactive” indicates offloading tasks before each execution phase proactively with an appropriate number of tasks at once to a potential victim (denoted by an underloaded/fast process). The experimental results confirm speedup improvements from to in important use cases compared to the previous solutions. Furthermore, this approach can support co‐scheduling tasks across multiple applications.

show abstract

Section: Reactive Load Balancingmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

From reactive to proactive load balancing for task‐based parallel applications in distributed memory machines

Chung

Weidendorfer

Fürlinger

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

Proactive Task Offloading for Load Balancing in Iterative Applications

Chung¹,

Weidendorfer²,

Fürlinger³

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Load imbalance is often a challenge for applications in parallel systems. Static cost models and pre-partitioning algorithms distribute the load at the beginning. Nevertheless, dynamic changes during execution or inaccurate cost indicators may lead to imbalance at runtime. Reactive work-stealing strategies can help monitor the execution and perform task migration to balance the load. However, the benefits depend on migration overhead and assumption about future execution.Our proactive approach further improves existing solutions by applying machine learning to online load prediction. Following that, we propose a fully distributed algorithm for adapting the prediction result to guide task offloading. The experiments are performed with an artificial test case and a realistic application named Sam(oa)$$^2$$ 2 on three systems with different communication overhead. Our results confirm improvements for important use cases compared to previous solutions. Furthermore, this approach can support co-scheduling tasks across multiple applications.

show abstract

Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library

Budanaz

Wille

Bäder

2022

2022 IEEE/ACM Parallel Applications Workshop: Alternatives to MPI+X (PAW-ATM)

View full text Add to dashboard Cite

Predictive, reactive and replication-based load balancing of tasks in Chameleon and sam(oa) ²

Cited by 3 publications

References 26 publications

From reactive to proactive load balancing for task‐based parallel applications in distributed memory machines

From reactive to proactive load balancing for task‐based parallel applications in distributed memory machines

Proactive Task Offloading for Load Balancing in Iterative Applications

Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library

Contact Info

Product

Resources

About

Predictive, reactive and replication-based load balancing of tasks in Chameleon and sam(oa) 2

Cited by 3 publications

References 26 publications

From reactive to proactive load balancing for task‐based parallel applications in distributed memory machines

From reactive to proactive load balancing for task‐based parallel applications in distributed memory machines

Proactive Task Offloading for Load Balancing in Iterative Applications

Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library

Contact Info

Product

Resources

About

Predictive, reactive and replication-based load balancing of tasks in Chameleon and sam(oa) ²