Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Sankaran, Ramanan; Angel, Jordan B.; Brown, William M.

doi:10.1002/cpe.3457

Cited by 4 publications

(3 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we believe code modifications are possible which would make it unnecessary to use a random permutation of nodes: by an adjustment of the code it should be possible to recast the communication pattern as a nearest neighbor communication and then use known methods to map the communication pattern optimally to the network, see, e.g. [33]; this will be a topic of future study. In any case, as with other parallel applications, optimizing communications in a multiuser environment is challenging insofar as the network bandwidth is shared by other users and furthermore it is not always possible for a user to reserve a communication-optimal subset of nodes for job execution.…”

Section: -Way Weak Scaling Resultsmentioning

confidence: 99%

Parallel accelerated vector similarity calculations for genomics applications

et al. 2018

View full text Add to dashboard Cite

The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this paper we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion (5 × 10 15 ) elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques applied to calculations of the Custom Correlation Coefficient for comparative genomics applications.

show abstract

Section: -Way Weak Scaling Resultsmentioning

confidence: 99%

Parallel accelerated vector similarity calculations for genomics applications

et al. 2018

View full text Add to dashboard Cite

show abstract

“…Sankaran et al [33] used a genetic algorithms for optimizing the mapping for two large-scale parallel S3D and LAMMPS on the Cray XK7 machine. Bhanot et al [13] used simulated annealing to optimize task layout of parallel applications SAGE and UMT2000 on the BlueGene/L machine.…”

Section: Related Workmentioning

confidence: 99%

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

Sreepathi

D’Azevedo

Philip

et al. 2016

Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering

View full text Add to dashboard Cite

On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States-Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory. Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

show abstract

“…The search space is reduced by generating a full-active schedule that satisfies precedence constraints, a neighborhood search is applied to exploit the search space for better solutions and to enhance the GA. Simulation suggests sustainability of this hybrid GA in solving JSSP. Sankaran et al [21] proposed a GA based parallel optimization technique aiming to improve the performance of batch schedule of two massively parallel application codes; a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Experiments have shown a significant deviation from ideal weak scaling and variability in performance.…”

Section: Previous Workmentioning

confidence: 99%

A Generic Adaptive Multi-Gene-Set Genetic Algorithm (AMGA)

Maaita¹,

Zraqou²,

Hamad³

et al. 2015

ijacsa

View full text Add to dashboard Cite

Abstract-Genetic algorithms have been used extensively in solving complex solution-space search problems. However, certain problems can include multiple sub-problems in which multiple searches through distinct solution-spaces are required before the final solution combining all the sub-solutions is found. This paper presents a generic design of genetic algorithms which can be used for solving complex solution-space search problems that involve multiple sub-solutions. Such problems are very difficult to solve using basic genetic algorithm designs that utilize a single gene-set per chromosome. The suggested algorithm presents a generic solution which utilizes both multi-gene-set chromosomes, and an adaptive gene mutation rate scheme. The results presented from experiments done using an automatic graphical user interface generation case study, show that the suggested algorithm is capable of producing successful solutions where the common single-gene-set design fails.

show abstract

Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Cited by 4 publications

References 29 publications

Parallel accelerated vector similarity calculations for genomics applications

Parallel accelerated vector similarity calculations for genomics applications

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

A Generic Adaptive Multi-Gene-Set Genetic Algorithm (AMGA)

Contact Info

Product

Resources

About