All-to-all communication patterns occur in many important parallel algorithms. This paper presents new algorithms for allto-all communication patterns (all-to-all broadcast and all-to-all personalized exchange) for wormhole switched 2D/3D torus-and mesh-connected multiprocessors. The algorithms use message combining to minimize message start-ups at the expense of larger message sizes. The unique feature of these algorithms is that they are the first algorithms that we know of that operate in a bottomup fashion rather than a recursive, top-down manner. For a 2 d × 2 d torus or mesh, the algorithms for all-to-all personalized exchange have time complexity of O(2 3d ). An important property of the algorithms is the O(d) time due to message start-ups, compared with O(2 d ) for current algorithms [15], [18]. This is particularly important for modern parallel architectures where the start-up cost of message transmissions still dominates, except for very large block sizes. Finally, the 2D algorithms for all-to-all personalized exchange are extended to O(2 4d ) algorithms in a 2 d × 2 d × 2 d 3D torus or mesh. These algorithms also retain the important property of O(d) time due to message start-ups. Index Terms-Interprocessor communication, parallel algorithms, collective communication, all-to-all communication, all-to-all broadcast, all-to-all personalized exchange, complete exchange.
INTRODUCTIONISTRIBUTED memory multiprocessors are an attractive candidate architecture for scalable, massively parallel applications. Since memory is distributed among the processors, interprocessor communication is realized by passing messages through an interconnection network. However, interprocessor communication overhead is one of the major factors that limit the performance of parallel systems and can become a bottleneck to scalable parallel implementations of computationally intensive applications. This has resulted in the development of efficient, high-speed network architectures and innovative algorithms for scheduling interprocessor communication to minimize message latency.A specific class of communication patterns that has received considerable attention in this regard is the class of collective communication patterns [7], [8], [10]. Collective communication is defined as a communication pattern involving a group of processes and it is supported by the Message Passing Interface (MPI) which tries to establish a portable, efficient, and flexible standard for message passing programs [19]. Commonly used collective communication patterns are broadcast, scatter, gather, all-to-all broadcast, and all-to-all personalized exchange. In contrast, pointto-point or unicast communication involves a single transmitter and a single recipient. Collective communication is notorious for its demands on network bandwidth and its consequent impact on algorithm execution time.Among these collective communication operations, allto-all communication patterns (all-to-all broadcast and allto-all personalized exchange) are generally the most demanding op...