Complete exchange in 2D meshes

Sundar, N.S.; Jayasimha, D.N.; Panda, D.K.; Sadayappan, P.

doi:10.1109/shpcc.1994.296672

Cited by 22 publications

(15 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For comparison, performance of the existing algorithms [15], [18] is also included in Table 2. Expressions of the number of steps for the proposed 2D algorithms demonstrate the improvement over existing algorithms [15], [18] that operate in a top-down fashion.…”

Section: Performance Results and Discussionmentioning

confidence: 99%

See 1 more Smart Citation

All to-all communication with minimum start-up costs in 2D/3D tori and meshes

Suh

Valamanchili²

1998

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

All-to-all communication patterns occur in many important parallel algorithms. This paper presents new algorithms for allto-all communication patterns (all-to-all broadcast and all-to-all personalized exchange) for wormhole switched 2D/3D torus-and mesh-connected multiprocessors. The algorithms use message combining to minimize message start-ups at the expense of larger message sizes. The unique feature of these algorithms is that they are the first algorithms that we know of that operate in a bottomup fashion rather than a recursive, top-down manner. For a 2 d × 2 d torus or mesh, the algorithms for all-to-all personalized exchange have time complexity of O(2 3d ). An important property of the algorithms is the O(d) time due to message start-ups, compared with O(2 d ) for current algorithms [15], [18]. This is particularly important for modern parallel architectures where the start-up cost of message transmissions still dominates, except for very large block sizes. Finally, the 2D algorithms for all-to-all personalized exchange are extended to O(2 4d ) algorithms in a 2 d × 2 d × 2 d 3D torus or mesh. These algorithms also retain the important property of O(d) time due to message start-ups. Index Terms-Interprocessor communication, parallel algorithms, collective communication, all-to-all communication, all-to-all broadcast, all-to-all personalized exchange, complete exchange. INTRODUCTIONISTRIBUTED memory multiprocessors are an attractive candidate architecture for scalable, massively parallel applications. Since memory is distributed among the processors, interprocessor communication is realized by passing messages through an interconnection network. However, interprocessor communication overhead is one of the major factors that limit the performance of parallel systems and can become a bottleneck to scalable parallel implementations of computationally intensive applications. This has resulted in the development of efficient, high-speed network architectures and innovative algorithms for scheduling interprocessor communication to minimize message latency.A specific class of communication patterns that has received considerable attention in this regard is the class of collective communication patterns [7], [8], [10]. Collective communication is defined as a communication pattern involving a group of processes and it is supported by the Message Passing Interface (MPI) which tries to establish a portable, efficient, and flexible standard for message passing programs [19]. Commonly used collective communication patterns are broadcast, scatter, gather, all-to-all broadcast, and all-to-all personalized exchange. In contrast, pointto-point or unicast communication involves a single transmitter and a single recipient. Collective communication is notorious for its demands on network bandwidth and its consequent impact on algorithm execution time.Among these collective communication operations, allto-all communication patterns (all-to-all broadcast and allto-all personalized exchange) are generally the most demanding op...

show abstract

Section: Performance Results and Discussionmentioning

confidence: 99%

“…Sundar et al [15] presented an algorithm, referred to as cyclic exchange, in power-of-two and square 2D meshes. In each phase of the cyclic exchange, every node communicates in two steps with two other nodes-one in the same row and one in the same column.…”

mentioning

confidence: 99%

All to-all communication with minimum start-up costs in 2D/3D tori and meshes

Suh

Valamanchili²

1998

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…In order to apply CALMANT, we have designed a CC-cube algorithm for the Complete Exchange problem that is described in the next section. The resulting algorithm has been compared with a wide range of proposals specially tuned for the solution of the Complete Exchange: Binary [1], Quadrant [1], Modified Quadrant [14], Store-and-Forward (SAF) [14], Direct [11], Cyclic [12], and Hybrid [13] methods. Some of these proposals are addressed to 2-dimensional meshes and some others to c-dimensional meshes.…”

Section: An Example: the Complete Exchange Problemmentioning

confidence: 99%

Hypercube algorithms on mesh connected multicomputers

Cerio

Valero-García

González

2002

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a kind of algorithms that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, this paper only focuses on the particular case of meshes.

show abstract

“…There are several related works, but they focus on either broadcasting under the all-port model which supports simultaneous send and receive to and from all neighbors [17,18] or different communication patterns such as complete exchange [16] or all-to-all personalized exchange [15]. Most of these works focus on minimizing time-step, since it is no longer a trivial problem, without considering minimizing total communication distance.…”

Section: Introductionmentioning

confidence: 99%

Time-Step Optimal Broadcasting in 3-D Meshes with Minimum Total Communication Distance

Cang

2000

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

In this paper we propose a new minimum total communication distance T C D algorithm and an optimal T C D algorithm for broadcast in a 3-dimensional mesh 3-D mesh . The former generates a minimum T C D from a given source node, and the latter guarantees a minimum T C D among all the possible source nodes. These algorithms are based on a divide-and-conquer approach where a 3-D mesh is partitioned into eight submeshes of equal size. The source node sends the broadcast message to a special node called an eye in each submesh. The above procedure is then recursively applied in each submesh. These algorithms can be generalized to a d-dimensional mesh or torus. In addition, the proposed approach can potentially be used to solve optimization problems in other collective communication operations.

show abstract

Complete exchange in 2D meshes

Cited by 22 publications

References 3 publications

All to-all communication with minimum start-up costs in 2D/3D tori and meshes

All to-all communication with minimum start-up costs in 2D/3D tori and meshes

Hypercube algorithms on mesh connected multicomputers

Time-Step Optimal Broadcasting in 3-D Meshes with Minimum Total Communication Distance

Contact Info

Product

Resources

About