Sun Microsystems, Palo Alto, CAThis 3rd-generation, superscalar processor, implementing the SPARC V9 64b architecture, improves performance over previous processors by improvements in the on-chip memory system and circuit designs enhancing the speed of critical paths beyond the process entitlement [1,2]. In the on-chip memory system, both bandwidth and latency are scaled. Keys to scaling memory latency are a sum-addressed memory data cache, which allows the average memory latency to scale by more than the clock ratio, and the use of a prefetch data cache [3]. Memory bandwidth is improved by using wave-pipelined SRAM designs for on-chip caches and a write cache for store traffic [4]. The chip operates at 800MHz and dissipates <60W from a 1.5V supply. It contains 23M transistors (12M in RAM cells) on a 244mm 2 die. Figure 25.2.1 contrasts this 7-metal-layeraluminum, 0.15µm CMOS design with the previous generations designs. To deal with the growing microprocessor complexity, more aggressive circuit techniques, interconnect delay optimization, crosstalk reduction, improved power and clock distribution schemes, and better thermal management are used.For minimum power dissipation and simplified verification, the primary circuit style is static CMOS using synthesis and automatic place and route. Where synthesis is not enough and full custom design not appropriate, a hybrid approach is used. Domino cells are manually placed and CAD tools shield all wires, route clocks, and insert power and ground. A commercial router completes routing of signals. For the most critical paths, custom dynamic logic design is used. Delayed reset logic is used in the SRAM structures for power minimization and to simplify clock distribution. Large caches use a self-timed latency control circuit for one-cycle throughput and twocycle latency. A predecode flip-flop circuit incorporates the predecode logic function, eliminating 2 logic levels and significantly speeding up the address decoding critical path. Logical structures are traditional domino logic as well as delayed clocking domino logic with an overlapping multiphase non-blocking clocking. Critical signals are never gated by clocks, creating a pseudo-transparent evaluation phase that maximizes speed. Consecutive logic stages are clocked by delayed phases with enough overlap to guarantee safe signal transition. A family of edge-triggered flip-flops includes dynamic flipflops producing monotonic outputs for domino logic [5]. Members of this family also embed a full logic level while maintaining a low input-to-output delay, allowing a pipeline with only 8 logic stages per clock cycle. For ease of verification, dynamic design is chiefly confined to fully-shielded full-custom structures.To facilitate single-cycle transfers, the working register file (WRF), which handles regular read/write operations, and the architectural register file (ARF), which stores 8 windows, are interleaved into one physical unit, a WARF (Figure 25.2.2). The WARF performs read, write, and transfer simultaneously. The 32...
Globally synchronous, multi-drop, bidirectional microprocessor system interfaces have the advantages of low latency and no synchronization penalty, and typically run at 100-120MHz. Maximum operating frequency is limited by the time required for a signal transition initiated at the driving end to settle and reliably get sampled at the receiving end. Typical source-terminated systems (such as HSTL [1]) require a round-trip transmission-line propagation delay to terminate the signal. With parallel terminated systems such as GTL[2] with 2 nodes, a bus turnaround low-to-low switch with no dead-cycle needs a round-trip propagation delay to settle and be sampled reliably. The dynamic termination logic (DTL) system reduces the settling time to a one-way delay, by having the driver at the receiving end terminate the signal, and raises the signaling frequency to 150-200MHz.A major concern with on-chip termination is to maintain nearly constant termination resistance over process, supply voltage, temperature (PVT), and output signal voltage variations. The impedance control and linearization schemes presented here address this concern. In addition, these circuits limit rail bounce, and maintain nearly constant driver resistance during switching. The circuits are implemented on the next generation SPARC microprocessor [3].Figure 15.1.1 shows a general 2-node DTL signaling system and its signal waveforms. The driver at the receiving end acts as a termination resistor to positive IO supply voltage (vddo). The output resistance of the pullup and pulldown units are matched to the characteristic impedance (Zo) of the transmission line. The system requires just a one-way propagation delay for the line to settle, even when a bus turnaround low-to-low switch occurs without an intervening dead-cycle. The signal voltage swing is vddo/2 to vddo; the driver is push-pull. The receiver is differential comparing the input signal to a reference voltage (vref=0.75*vddo).For 3-node systems, two methods may be used (Figure 15.1.2). If the length of the middle stub is <1 inch, scheme-1 gives better performance. Otherwise, scheme-2 must be used to prevent oxide overvoltage. This is because scheme-1 has larger voltage overshoots than scheme-2 when a middle driver switches from drive-low to receive (tristate), large enough to exceed process specification for gate-oxide overvoltage (2.1V DC, 2.4V AC). For systems with more than 3 nodes, scheme-1 is used since scheme-2 would require low pulldown resistance (resulting in larger currents and device area).The DTL output driver is linearized, impedance-controlled, and slew-rate controlled, and functions as both a driver and a pullup terminator. Each pullup and pulldown output unit consists of multiple elements of varying widths one of which is permanently enabled, while the others are enabled or disabled according to an impedance-control code to give a desired net DC output impedance across PVT variations. This code follows a "thermometer-code"only one bit changes per code update, and the order of bit c...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.