Charles J Archer scite author profile

BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of low power dual-processor compute nodes interconnected by high speed torus and collective networks. Because compute nodes do not have shared memory, MPI is the the natural programming model for this machine. The BlueGene/L MPI library is a port of MPICH2.In this paper we discuss the implementation of MPI collectives on BlueGene/L. The MPICH2 implementation of MPI collectives is based on point-to-point communication primitives. This turns out to be suboptimal for a number of reasons. Machine-optimized MPI collectives are necessary to harness the performance of BlueGene/L. We discuss these optimized MPI collectives, describing the algorithms and presenting performance results measured with targeted micro-benchmarks on real BlueGene/L hardware with up to 4096 compute nodes.

show abstract

Design and implementation of message-passing services for the Blue Gene/L supercomputer

Almási

Archer

Castaños

et al. 2005

IBM J. Res. & Dev.

View full text Add to dashboard Cite

The Blue Genet/L (BG/L) supercomputer, with 65,536 dualprocessor compute nodes, was designed from the ground up to support efficient execution of massively parallel message-passing programs. Part of this support is an optimized implementation of the Message Passing Interface (MPI), which leverages the hardware features of BG/L. MPI for BG/L is implemented on top of a more basic message-passing infrastructure called the message layer. This message layer can be used both to implement other higher-level libraries and directly by applications. MPI and the message layer are used in the two BG/L modes of operation: the coprocessor mode and the virtual node mode. Performance measurements show that our message-passing services deliver performance close to the hardware limits of the machine. They also show that dedicating one of the processors of a node to communication functions (coprocessor mode) greatly improves the message-passing bandwidth, whereas running two processes per compute node (virtual node mode) can have a positive impact on application performance. job of porting it to different architectures. With this design, we could focus on optimizing the constructs that were of importance to BG/L. BG/L is a feature-rich machine. A good implementation of message-passing services in BG/L must leverage those features to deliver high-performance communication services to applications. Its compute nodes are interconnected by two high-speed networks: a three-dimensional (3D) torus network that supports direct point-to-point communication [6] and a collective network to support broadcast and reduction operations. Those networks are mapped to the address space of user processes and can be used directly by a message-passing library. We show how we designed our message-passing implementation to take advantage of both types of memory-mapped networks.Another important architectural feature of BG/L is its dual-processor compute nodes. A compute node can operate in one of two modes. In coprocessor mode, a single process, spanning the entire memory of the node, can use both processors by running one thread on each processor. In virtual node mode, two single-threaded ÓCopyright 2005 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems.

show abstract

Blue Gene/L programming and operating environment

et al. 2005

View full text Add to dashboard Cite

With up to 65,536 compute nodes and a peak performance of more than 360 teraflops, the Blue Genet/L (BG/L) supercomputer represents a new level of massively parallel systems. The system software stack for BG/L creates a programming and operating environment that harnesses the raw power of this architecture with great effectiveness. The design and implementation of this environment followed three major principles: simplicity, performance, and familiarity. By specializing the services provided by each component of the system architecture, we were able to keep each one simple and leverage the BG/L hardware features to deliver high performance to applications. We also implemented standard programming interfaces and programming languages that greatly simplified the job of porting applications to BG/L. The effectiveness of our approach has been demonstrated by the operational success of several prototype and production machines, which have already been scaled to 16,384 nodes.

show abstract

EUDOC on the IBM Blue Gene/L system: Accelerating the transfer of drug discoveries from laboratory to patient

et al. 2008

View full text Add to dashboard Cite

EUDOCe is a molecular docking program that has successfully helped to identify new drug leads. This virtual screening (VS) tool identifies drug candidates by computationally testing the binding of these drugs to biologically important protein targets. This approach can reduce the research time required of biochemists, accelerating the identification of therapeutically useful drugs and helping to transfer discoveries from the laboratory to the patient. Migration of the EUDOC application code to the IBM Blue Gene/Le (BG/L) supercomputer has been highly successful. This migration led to a 200-fold improvement in elapsed time for a representative VS application benchmark. Three focus areas provided benefits. First, we enhanced the performance of serial code through application redesign, hand-tuning, and increased usage of SIMD (single-instruction, multiple-data) floating-point unit operations. Second, we studied computational load-balancing schemes to maximize processor utilization and application scalability for the massively parallel architecture of the BG/L system. Third, we greatly enhanced system I/O interaction design. We also identified and resolved severe performance bottlenecks, allowing for efficient performance on more than 4,000 processors. This paper describes specific improvements in each of the areas of focus.

show abstract

Implementing MPI on the BlueGene/L Supercomputer

Almási

Archer

Castaños

et al. 2004

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Charles J Archer

Optimization of MPI collective communication on BlueGene/L systems

Design and implementation of message-passing services for the Blue Gene/L supercomputer

Blue Gene/L programming and operating environment

EUDOC on the IBM Blue Gene/L system: Accelerating the transfer of drug discoveries from laboratory to patient

Implementing MPI on the BlueGene/L Supercomputer

Contact Info

Product

Resources

About