Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations

Çatalyürek, Ümit V.; Boman, Erik G.; Devine, Karen Dragon; Bozdağ, Doruk; Heaphy, Robert; Riesen, Lee Ann

doi:10.1109/ipdps.2007.370258

Cited by 112 publications

(94 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…They do scheduling of file sharing tasks in three phases by using hypergraph partitioning method. Giersch et al (2006;Fujimoto and Hagihara, 2004;Karypis and Kumar, 1998;Catalyurek et al, 2007) proposed several different heuristics which reduce the time complexity while preserving the quality of schedules. This scheduling decision is based on the greedy choices that depend on the momentary completion time of tasks.…”

Section: Related Workmentioning

confidence: 99%

Balanced Scheduling of Independent File-Sharing Tasks in Heterogenous Environment

Ponsy¹

2011

Journal of Computer Science

View full text Add to dashboard Cite

Problem statement: To examine the strategies for scheduling of independent file-sharing tasks in a heterogeneous environment and the concept of load balancing. Approach: We propose hypergraph partitioning based strategy for the scheduling of non-critical jobs. This is done by scheduling the tasks that share tasks among them to the same processor. The tasks thus scheduled are employed to a load balancing scheme for balancing the load on the processors by considering the average load on all processors. Results: This strategy reduces the input output overheads among the tasks thus reducing the end-point contention. Conclusion: Thus the batch execution time on the processors is reduced.

show abstract

Section: Related Workmentioning

confidence: 99%

Balanced Scheduling of Independent File-Sharing Tasks in Heterogenous Environment

Ponsy¹

2011

Journal of Computer Science

View full text Add to dashboard Cite

show abstract

“…In a graph partitioning problem, the goal is, informally, to partition a set of nodes in a graph such that the weighted sum of nodes in each partition is bounded by some positive integer K and the weighted sum of edges between partitions is bounded by another positive integer J. This problem is known to be NP-complete, although algorithms to solve it exist, including in the Zoltan library [4]. In our case, all weights would be 1.…”

Section: Allocating Tasks To Processorsmentioning

confidence: 99%

Peer-to-peer architectures for exascale computing : LDRD final report.

Report¹,

Mayo²,

Vorobeychik³

et al. 2010

View full text Add to dashboard Cite

The goal of this research was to investigate the potential for employing dynamic, decentralized software architectures to achieve reliability in future high-performance computing platforms. These architectures, inspired by peer-to-peer networks such as botnets that already scale to millions of unreliable nodes, hold promise for enabling scientific applications to run usefully on next-generation exascale platforms (∼ 10 18 operations per second). Traditional parallel programming techniques suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. Our studies suggest that new architectures, in which failures are treated as ubiquitous and their effects are considered as simply another controllable source of error in a scientific computation, can remove such obstacles to exascale computing for certain applications. We have developed a simulation framework, as well as a preliminary implementation in a large-scale emulation environment, for exploration of these "fault-oblivious computing" approaches.

show abstract

“…To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. adaptive codes dynamically redistribute the work generated by their refinement techniques among processors in a parallel system [3,4]. Future exascale machines are expected to require adaptivity at even lower levels of the software stack, to distribute computational tasks among potentially heterogeneous compute resources, to balance power requirements and to adjust to hardware faults [15].…”

Section: Introductionmentioning

confidence: 99%

Simplifying Performance Analysis of Large-scale Adaptive Scientific Applications

Bhatelé¹,

Gamblin²,

Gunney³

et al. 2012

View full text Add to dashboard Cite

Performance analysis of parallel scientific codes is becoming increasingly difficult due to the rapidly growing complexity of applications and architectures. Existing tools fall short in providing intuitive views that facilitate the process of performance debugging and tuning. In this paper, we exploit a recent idea of projecting and visualizing performance data on the communication and hardware domain for faster, more intuitive analysis of applications. We leverage several performance analysis and visualization tools to showcase the discovery of scalability bottlenecks in a structured AMR library. Using novel techniques to project per-phase timing data, application data, and communication data on a communication graph, we identify a previously elusive scaling bottleneck in the library. We present solutions that mitigate this problem, resulting in 22% improvement in the performance for a 65,536-core run on an IBM Blue Gene/P system.

show abstract

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations

Cited by 112 publications

References 25 publications

Balanced Scheduling of Independent File-Sharing Tasks in Heterogenous Environment

Balanced Scheduling of Independent File-Sharing Tasks in Heterogenous Environment

Peer-to-peer architectures for exascale computing : LDRD final report.

Simplifying Performance Analysis of Large-scale Adaptive Scientific Applications

Contact Info

Product

Resources

About