The IBM POWER8i processor is a 649-mm 2 , 4.2-billion transistor, high-frequency microprocessor fabricated in the IBM 22-nm silicon on insulator (SOI) technology with embedded dynamic random access memory (eDRAM) and 15 layers of metal. With its twelve architecturally enhanced, eight-way multithreaded cores, 96-MB high-bandwidth shared third-level cache, and increased on and off-chip bandwidth, the POWER8 processor delivers industry-leading performance. This paper describes the circuit techniques and design methodologies that were employed for implementing this chip and that allowed it to maintain the power dissipation at the level of its predecessor while delivering a threefold increase in per-socket performance. Among the innovative technologies employed by the processor are resonant clocking, on-chip per-core voltage regulation, and enhanced eDRAM arrays. Chip overviewThe IBM POWER8* processor [1,2] is the eighth generation of IBM Power Architecture* implemented in IBM's 22-nm embedded dynamic random access memory (eDRAM) silicon on insulator (SOI) technology [3]. The 649 mm 2 POWER8 processor die includes twelve architecturally enhanced eight-way multithreaded cores with high-throughput private second-level caches, a 96-MB high-bandwidth eDRAM third-level cache, an on-chip symmetric multi-processor (SMP) fabric, a set of cryptography and memory compression accelerators, memory controllers with I/O links capable of connecting to a maximum of eight memory buffer chips [4], six high-bandwidth off-chip SMP links, and 32 third-generation PCI Express** (PCIe**) lanes. Figure 1 shows the die photograph of the POWER8 processor. The processor cores are grouped into four quadrants. Each core has a private 512-KB level-2 (L2) cache with a read bandwidth of 64 bytes per cycle. The shared 96-MB level-3 (L3) cache is physically placed into the core quadrants. The on-chip SMP buses connecting the processor cores, memory controllers, accelerators, and I/O units are running through the horizontal stripe in the center of the chip, referred to as Fabric in Figure 1, and the vertical wiring channel in the middle. The on-node SMP buses, responsible for intra-node communication, are located along the top edge of the die, while the off-node SMP buses, responsible for inter-node communication, are located along the bottom edge together with the PCIe links. The memory links connecting the POWER8 processor to a maximum of eight memory buffer chips are located on the left and right side of the die. The accelerator units are located between the two core quadrants in the upper half of the processor die. The POWER8 processor contains approximately 4.2 billion transistors. Compared to its predecessor, the POWER7* processor [2, 5, 6], the POWER8 chip achieves a 50% improvement in single-thread performance, a two-fold increase in the per-core performance, and a three-fold increase in the chip throughput when measured at the same frequency [7]. In terms of the maximum core and SMP bus frequencies, the POWER8 processor achieves an incremental ...
Extending data rates to meet the I/O needs of future computing and network systems is complicated by limited channel bandwidth. While a DFE [1] can be used to compensate channel distortion, its power dissipation reduces link energy efficiency, which is vitally important in complex systems. One way of reducing DFE power consumption is to use current-integrating summers [2][3][4][5]. Previously published current-integrating DFEs operating above 5Gb/s [3,5] were demonstrated on simple test chips lacking support circuitry for CDR and DFE adaptation functions. The architecture presented here includes additional data paths based on current-integrating summers to realize a fully integrated RX with CDR and continuous DFE adaptation. The design also features a digital calibration loop for setting the summer bias currents so that high performance is achieved over process variations and different data rates.The top-level RX architecture is shown in Fig. 21.6.1. The input data path is similar to that described in [1], except that a peaking amplifier is added to provide linear equalization which complements the operation of the DFE. The peaking amplifier, which uses a zero-peaked topology with switched capacitive degeneration, has 8 peaking settings, with a nominal range of 0-to-6dB at halfbaud frequency. The DFE uses a half-rate architecture; the tap weights for even and odd halves are adapted independently to improve tolerance against dutycycle distortion [6]. Half-rate clocks (C2) are generated by CML phase interpolators. Converting these clocks to CMOS rail-to-rail signals saves power in their distribution to the DFE, phase detector, and 2:8 DEMUX. In addition to the clocks (Clk D and Clk E ) used to sample the center and edges of the bits, a third clock (Clk A ) is generated which can be independently swept to monitor horizontal eye opening. An integrator calibration circuit provides operating point information to the integrator calibration logic which sets the integrator bias currents. The RX is powered from external analog (VDDA, nominally 1.2V) and digital (VDD, nominally 1.0V) supplies. A third supply VREG (nominally 1.0V) is generated from VDDA by a linear regulator with less than 40mVpp ripple; this supply is used to power noise-critical circuits, such as the CMOS C2 buffers within the DFE.The block diagram of a DFE half is shown in Fig. 21.6.2. The first DFE tap (H1) is realized by two-path speculation. This costs more area and power than a direct architecture [5], but ensures that all DFE feedback signals are fully established at the beginning of integration, improving summation accuracy. The two most critical DFE timing paths are the H2 feedback loop and the MUX select path. To meet timing constraints, these paths are realized in CML. A DCVS latch is used to convert CML to CMOS levels, so that the later tap (H3-H5) circuitry can be implemented in static CMOS to save power and area. The CML circuits are powered from VDDA; the DCVS and static CMOS circuits are powered from VREG. The path used for eye monitoring is...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.