Efficient broadcast and multicast on multistage interconnection networks using multiport encoding

Sivaram, R.; Panda, D.K.; Stunkel, Craig B.

doi:10.1109/71.730529

Cited by 35 publications

(13 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results in [40] show the proof-of-concept of multidestination message passing in MINs. However, the paper's focus is on one specific form of multidestination header encoding.…”

Section: Contributions Of This Papermentioning

confidence: 73%

“…Similarly, it does not explore design alternatives that would simplify support for such multidestination worms in the more traditional inputbuffered switches. Furthermore, the multiport-based encoding in [40] often requires a multicast to be implemented in multiple phases. Even though this approach requires fewer phases than the unicast-based software scheme, a challenging problem is to implement multicast in even fewer phases using better encoding schemes.…”

Section: Contributions Of This Papermentioning

confidence: 99%

“…To prevent deadlock under asynchronous replication, switches must be equipped with buffers large enough to store the largest packet in the system. Deadlock is prevented if the switches can guarantee that a packet accepted for transmission can be eventually completely buffered at the switch [40], a requirement that is weaker than virtual cut-through [16].…”

Section: Deadlock-free Replicationmentioning

confidence: 99%

“…In recent work [38], [40], we have extended the multidestination message passing concept to multistage interconnection networks. This extension involves a new concept called multiport encoding and an asynchronous replication mechanism at the input buffers of a switch to implement deadlock-free multicast.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact

Sivaram

Stunkel

Panda

2000

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

AbstractÐMultidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is nontrivial. In this paper, we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed, and two architectural alternatives are presented that reduce the wiring complexity in a practical switch implementation. The central-buffer-based and input-buffer-based implementations are evaluated against each other, as well as against the corresponding software-based schemes. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the centralbuffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. These results show that multidestination message passing can be applied easily and effectively to switchbased parallel systems to deliver good multicast and collective communication performance.

show abstract

“…The results in [40] show the proof-of-concept of multidestination message passing in MINs. However, the paper's focus is on one specific form of multidestination header encoding.…”

Section: Contributions Of This Papermentioning

confidence: 73%

Section: Contributions Of This Papermentioning

confidence: 99%

Section: Deadlock-free Replicationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact

Sivaram

Stunkel

Panda

2000

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The multicast operation [1,4,5,9,13,[16][17][18][21][22][23] is a very common used operation in cluster systems. It can improve the performance of interconnection communication of processors [14] and can be used to implement many efficient collective communication operations.…”

Section: Introductionmentioning

confidence: 99%

Hardware supported multicast in fat-tree-based InfiniBand networks

2007

View full text Add to dashboard Cite

The multicast operation is a very commonly used operation in parallel applications. It can be used to implement many collective communication operations as well. Therefore, its performance will affect parallel applications and collective communication operations. With the hardware supported multicast of the InfiniBand Architecture (IBA), in this paper, we propose a cyclic multicast scheme for fat-treebased (m-port n-tree) InfiniBand networks. The basic concept of the proposed cyclic multicast scheme is to find the union sets of the output ports of switches in the paths between the source processing node and each destination processing node in a multicast group. Based on the union sets and the path selection scheme, the forwarding table for a given multicast group can be constructed. We implement the proposed multicast scheme along with the OpenSM multicast scheme and the unicast scheme on an m-port n-tree InfiniBand network simulator. Several one-to-many, many-to-many, many-to-all, and all-to-many multicast cases are simulated. The simulation results show that the proposed multicast scheme outperforms the unicast scheme for all simulated cases. For one-to-many case, the performance of the cyclic multicast scheme is the same as that of the OpenSM multicast scheme. For many-to-many and all-tomany cases, the cyclic multicast scheme outperforms the OpenSM multicast scheme. For many-to-all case, the performance of the cyclic multicast scheme is a little better than that of the OpenSM multicast scheme.

show abstract

The Performance of Multicast Banyan Networks

Yang¹

2000

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Efficient broadcast and multicast on multistage interconnection networks using multiport encoding

Cited by 35 publications

References 26 publications

Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact

Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact

Hardware supported multicast in fat-tree-based InfiniBand networks

The Performance of Multicast Banyan Networks

Contact Info

Product

Resources

About