This paper describes the architecture of a third-generation switching element which may appear in future IBM RS/6000 SP interconnection networks. In this paper this ASIC will be referred as the Switch3 switch chip. Like its predecessors, Switch3 is an 8-port device implementing output-queuing using the high-utilization central-buffering technique. However, Switch3 offers significant enhancements over these existing SP switch chips by incorporating advances in both VLSI technology and in recent interconnection network research. Switch3 introduces a new form of adaptive routing with the potential to significantly improve network bandwidth. It also offers support for collective communication via a powerful hardware multicast replication capability. The technology advances allow link bandwidth to be improved to 500 MB/s per direction per link, and allow the central buffer size to be doubled compared to the current SP switch. Furthermore, the larger Switch3 input buffers are capable of supporting link lengths of up to 100 meters, enabling richly-connected, scalable topologies with a high aggregate bandwidth. Finally, Switch3 offers a number of other significant enhancements including limited support for high-priority traffic and detailed performance monitoring information.
AbstractÐMultidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is nontrivial. In this paper, we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed, and two architectural alternatives are presented that reduce the wiring complexity in a practical switch implementation. The central-buffer-based and input-buffer-based implementations are evaluated against each other, as well as against the corresponding software-based schemes. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the centralbuffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. These results show that multidestination message passing can be applied easily and effectively to switchbased parallel systems to deliver good multicast and collective communication performance.
Multidestination message passing has been proposed as an attmctive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is non-trivial. In this paper we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed. Both of these implementations are evaluated against each other as we11 as against a software-based scheme using the central buffer organization. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the central-buffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. Thus, multidestination message passing can easily be applied to switch-based parallel systems to deliver good collective communication performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.