Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Mei, Chao; Sun, Youwen; Zheng, Gengbin; Bohm, Eric J.; Kalé, Laxmikant V.; Phillips, J. C.; Harrison, Christopher B.

doi:10.1145/2063384.2063466

Cited by 45 publications

(70 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It simulates the behavior of atoms based on the Lennard-Jones potential, which is an effective potential that describes the interaction between two uncharged molecules or atoms. The computation performed in this code mimics the short-range non-bonded force calculation in NAMD [7], an application widely used by biophysicists, that won the Gordon Bell award.…”

Section: A Leanmdmentioning

confidence: 99%

See 1 more Smart Citation

Automated Load Balancing Invocation Based on Application Characteristics

Menon

Jain

Zheng

et al. 2012

2012 IEEE International Conference on Cluster Computing

Self Cite

View full text Add to dashboard Cite

Abstract-Performance of applications executed on large parallel systems suffer due to load imbalance. Load balancing is required to scale such applications to large systems. However, performing load balancing incurs a cost which may not be known a priori. In addition, application characteristics may change due to its dynamic nature and the parallel system used for execution. As a result, deciding when to balance the load to obtain the best performance is challenging. Existing approaches put this burden on the users, who rely on educated guess and extrapolation techniques to decide on a reasonable load balancing period, which may not be feasible and efficient.In this paper, we propose the Meta-Balancer framework which relieves the application programmers of deciding when to balance load. By continuously monitoring the application characteristics and using a set of guiding principles, MetaBalancer invokes load balancing on its own without any prior application knowledge. We demonstrate that Meta-Balancer improves or matches the best performance that can be obtained by fine tuning periodic load balancing. We also show that in some cases Meta-Balancer improves performance by 18% whereas periodic load balancing gives only a 1.5% benefit.

show abstract

Section: A Leanmdmentioning

confidence: 99%

“…Therefore, load balancer can use the instrumented load information to make load balancing decisions. The key advantage of this approach is that it is application independent, and it has been shown to be effective for a large class of applications, such as NAMD [7], ChaNGa [8] and Fractography3D [9].…”

Section: Introductionmentioning

confidence: 99%

Automated Load Balancing Invocation Based on Application Characteristics

Menon

Jain

Zheng

et al. 2012

2012 IEEE International Conference on Cluster Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…We leverage CHARM++'s SMP machine layer [9], which instead of creating one OS process per core of an n core node, the runtime creates k OS processes per node, such that k<n and n k is an integer. This allows chares within a process to leverage more efficient intra-node communication via shared memory for the following benefits: (i) inter-thread communication can be implemented with direct memory copy, (ii) the communication thread minimizes the interference between application compute functions and communication, (iii) sharing of read-only data across all threads reduces memory consumption.…”

Section: A Charm++ Smp Modementioning

confidence: 99%

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters

Yeom

Bhatele

Bisset

et al. 2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

Self Cite

View full text Add to dashboard Cite

“…In this paper, we extended the work to support the SMP (multi-threaded) execution mode [13], [12], which is tuned for running on multicore-based parallel machines.…”

Section: B Charm++ Runtime Systemmentioning

confidence: 99%

“…In our earlier work, we have shown how NAMD scales to 64K cores of BG/P, and 224k cores on Jaguar XT5. Table I presents the performance of NAMD running a 100-million-atom system on Jaguar XT5 in 2011 [12]. It can be seen that the speedup starts to falter beyond 64K cores.…”

Section: Introductionmentioning

confidence: 99%

Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6

Sun

Zheng

Mei

et al. 2012

2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Self Cite

View full text Add to dashboard Cite

Abstract-Achieving good scaling for fine-grained communication intensive applications on modern supercomputers remains challenging. In our previous work, we have shown that such an application -NAMD -scales well on the full Jaguar XT5 without long-range interactions; Yet, with them, the speedup falters beyond 64K cores. Although the new Gemini interconnect on Cray XK6 has improved network performance, the challenges remain, and are likely to remain for other such networks as well. We analyze communication bottlenecks in NAMD and its CHARM++ runtime, using the Projections performance analysis tool. Based on the analysis, we optimize the runtime, built on the uGNI library for Gemini. We present several techniques to improve the fine-grained communication. Consequently, the performance of running 92224-atom Apoa1 with GPUs on TitanDev is improved by 36%. For 100-million-atom STMV, we improve upon the prior Jaguar XT5 result of 26 ms/step to 13 ms/step using 298,992 cores on Jaguar XK6.

show abstract

Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Cited by 45 publications

References 26 publications

Automated Load Balancing Invocation Based on Application Characteristics

Automated Load Balancing Invocation Based on Application Characteristics

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters

Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6

Contact Info

Product

Resources

About