The article proposes a fine-grained all-to-all communication operation that can implement flexible data redistribution patterns of irregular applications, such as particle codes. The flexibility is achieved by user-defined distribution functions, which are used to specify how data elements are to be redistributed among parallel processes on a distributed memory platform. The usage is illustrated for a particle data redistribution step of a grid-based particle code in which the destination processes for particles are calculated from the particle positions by a specific distribution function. Additionally, the fine-grained all-to-all communication operation proposed allows the duplication and modification of data elements during the data redistribution. This functionality is useful for automatically creating ghost particles for the domain decomposition of the particle code during the particle data redistribution step. The interface of the fine-grained all-to-all communication operation is described and several algorithms for implementing the operation on top of existing MPI operations are presented. Performance results on an IBM Blue Gene/Q platform demonstrate the performance of the communication operation proposed with synthetic benchmark data as well as with a parallel particle code.
KEYWORDSall-to-all communication, data redistribution, distributed memory, message passing, particle simulations
INTRODUCTIONParticle simulation methods are popular approaches for the numerical simulation of complex physical problems. A major computational part of particle codes usually consists of the calculations of the pair-wise interactions between the particles of a given particle system. Long-range interactions, such as Coulomb or gravitational interactions, can provide significant contributions to the results even for particles that are far away from each other in the particle system. Thus, all pair-wise interactions need to be considered, which leads to algorithmic and computational challenges especially for large particle systems. Solver methods for long-range interactions 1 have been developed for a high scaling parallel library within the ScaFa-CoS project. 2 This library includes parallel implementations of tree-based methods, such as the Fast Multipole Method (FMM) 3 or the Barnes-Hut algorithm, 4 as well as grid-based methods, such as Particle-Particle-Particle Mesh (P3M) 5 or fast summations based on nonequispaced fast Fourier transforms (P2NFFT). 6 All these parallel solver methods include a solver-specific distribution of the particle data among the parallel processes executed on a distributed memory platform. Applying such a parallel solver method to a specific particle application code includes data redistribution steps between the specific particle application code and the parallel solver method of the ScaFaCoS library.Data redistribution in particle codes has to be performed efficiently such that their runtime is negligible in comparison to the computational costs of the particle interactions. Message passing li...