Several large applications have been paralleli,zed on Nectar, a network-based multicomputer recently developed by Carnegie Mellon.These applications were previously either too large or too complex to be easily implemented on distributed memory parallel systems. Parallelizing these applications was made possible by the cooperative use of many existing general-purpose computers over high-speed networks, and by an implementation methodology based on a clean separation between applicatiionspecific and system-specific code.We illustrate these points using our experience with parallelizing three real-world applications.The success in these applications clearly points out a new direction in parallel processing.
Recent parallel processor supercomputer designs use an active backplane of routers to form the interconnections between processing elements [ 1, 21. Today, high-bandwidth interconnect systems capable of scaling to configurations with >500 processing nodes tend to use self-timed designs. This avoids clock distribution problems seen in large phase-sensitive synchronous systems. The BiCMOS routing component described here employs 200MHz clocked communication for large scalable parallel-processor supercomputer systems. This scheme eliminates need for clock edges phase-aligned across the clock distribution network. Additionally, router inputs accept data at any phase relationship to the receiving router internal clock.The router communicates with its nearest neighbor routers in both theX and Y directions, as well as with a processing node.Router-to-router connections are clocked. The router-to-processing-node connection is self-timed and operates a t any frequency up to the 200MHz router backplane frequency.Each proressing node contains a network interface component (NIC), a microprocessor, and memory. The processing node connection to the backplane is shown in Figure 1. To send a message from one processing node to another, the source processor includes a routing header (an X andY displacement) in the first two 16b words of the message. The routing header specifies the path through the backplane to the final destination. The message is passed to arouter that is part ofthe active backplane. Routers pass messages to nearest neighbors first along the X-direction (until the X-displacement has been reached), and then along the Y-direction (until the Y-displacement has been reached) similarly to other 2D deterministic routers [31. The destination router passes the message (minus the header) to its processing node.Each router communicates with its neighbors via unidirectional point-to-point port connections for inputs and outputs ( Figure 2). The ports are grouped in inbouncUoutbound pairs. Each unidirectional port consists ofdata, parity, control, transmit clock, and reference voltage. Input buffers are differential amplifiers with one input connected to the data pin and the other connected to the reference voltage generated by the sending router, reducing common-mode noise a t the input.The receiving router latches incoming data using both edges of the data-transmit clock. The transmit clock transitions a t the same time as the data. The receiver uses a delay-locked loop to center received transmit clock on received data. Centering of data clocks maximizes jitter margins for data clock edges with resptbct t o incoming data. Figure 3 shows a simplified diagram of the delay-locked loop. The input data transmit clock goes through two equal delay circuits. The centered output is taken from the middle of the two delay circuits. The output of the second delay block is fed back and compared to the input signal. Ifthe combined delay is less than the input clock high time then the control to thedelay circuit increases each blockde...
Several large applications have been paralleli,zed on Nectar, a network-based multicomputer recently developed by Carnegie Mellon.These applications were previously either too large or too complex to be easily implemented on distributed memory parallel systems. Parallelizing these applications was made possible by the cooperative use of many existing general-purpose computers over high-speed networks, and by an implementation methodology based on a clean separation between applicatiionspecific and system-specific code.We illustrate these points using our experience with parallelizing three real-world applications.The success in these applications clearly points out a new direction in parallel processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.