No abstract
The SB-PRAM is a massively parallel, uniform memory access (UMA)
A chip with two 64b PowerPC™ microprocessors, each with a 1MB dedicated L2 cache and a single shared high-speed processor-interconnect (PI) [1] bus is created. A second single-processor chip with a 1MB L2 cache is also created with a different performance/power optimization. The chips are built in 90nm dual strained-silicon SOI technology [2] using 10 layers of copper interconnect and low-k dielectric.The PPC970MP dual-processor chip (MP) in Fig. 5.5.7 consists of 2 processor units (PUs) that are mirrored, and a common region. The I/O and PLL circuitry reside in the common area. The design of the PU core is an extension to a previous PowerPC™ design [3]. This core is the basis for both the MP and the PPC970GX singleprocessor chip (SP). The PU contains a 64kB L1 instruction cache, a 32kB L1 data cache supported by 2 load-store units, and a unified 1MB L2 cache. Each PU can dispatch up to 5 instructions per cycle and issue one instruction per cycle to each of its execution units, of which there are 2 integer, 2 floating point, 2 load/store, 2 single-instruction, multiple-data execution units and 2 additional units that execute control operations.Both chips have a single PI [1] off-chip bus, consisting of 2 unidirectional source-synchronous single-ended links. The 36 data bits of each link are encoded into 44b for parity and to minimize simultaneous switching noise. The circuit in Fig. 5.5.1 generates an adjustable reference voltage (vref) by shorting the terminated differential inputs, clkin and clkin_b, together through series resistors and passgates. Low-pass filtering allows vref to ride commonmode noise, improving the eye.Vref windage adjusts vref up or down to compensate for asymmetrical voltages or input waveforms. Tunable termination windage bits select which 14 terminating resistors are enabled, maintaining constant input resistance.The bit rate per channel scales with core frequency in a 1:2 ratio, as it does in the Power5™ design [1], to keep pace with data demand. At 3GHz core frequency, the PI provides 10.7GB/s data bandwidth as well as address and control overhead.The MP is divided into clock and voltage domains, as shown in Fig. 5.5.2. Processor 0 (P0), the I/O control and the single PLL share the Vdd0 supply and clock mesh 0. A separate mesh, mesh 2, covers the L-shaped region where the PI receivers are placed. Mesh 2 is also powered by Vdd0. P1 is supplied by Vdd1 and clocked by mesh 1.The arbiter, shown in Fig. 5.5.3, controls the shared off-chip bus of the MP. The chip sends a data stream of 2 to 34 beats, 36 logicalbits wide. The first 2 beats are a header defining the packet function. The arbiter defines the path between the individual processor and the PI bus for the packet, e.g., if P0 wins arbitration, its packet flows through latch Lt1 only. The packet from the losing processor, P1, has its header bits stored in Lt1 and Lt0 latches of P1. The arbiter signals P1, halting its data transfer to the bus. In roundrobin fashion, P1 wins the next arbitration. P1 shifts out the contents of its Lt0 a...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.