Intel, Hillsboro, ORThe recent emphasis on power efficiency in serial I/O [1-4] reflects the growing need for lower-power chip-to-chip interfaces for computing systems. Boardlevel transceivers using a variety of low-power circuit techniques have demonstrated power efficiencies as low as 2.2mW/(Gb/s) across four data lanes [1]. Because power efficiency generally degrades as the per-lane data rate increases [2], low-power interfaces with high aggregate bandwidths must combine many parallel data lanes within the silicon, package and board area constraints. Parallel links can also reduce average power by disabling some or all lanes during periods of sub-peak bandwidth demand, but the efficiency and latency of this scheme is limited by wake-up time [4]. This paper describes a 470Gb/s binary NRZ parallel interface in 45nm CMOS that consumes 1.4mW/Gb/s. The circuitry and interconnect were co-designed to minimize power and area. Power is reduced by sharing clocking within "bundles" of data lanes, minimizing the span of clock signals and pairing a low-swing TX driver with a sensitive RX sampler. Silicon area is minimized by using on-chip transmission lines (TLs) to redistribute clock and data signals, while a dense, top-side package connector enables intra-bundle delay matching. The interface also has fast (<5ns) RX standby wake-up and an integrated wake-up time measurement circuit to enable aggressive power management. The interface and channel topology are intended for CPU-to-CPU and CPU-to-memory communication.Figure 8.1.1 shows the interconnect topology and link schematic. The link is asymmetric full-duplex with 19 lanes in one direction and 28 lanes in the opposite direction. The data lanes are organized into groups of 9 or 10, which are referred to as bundles. A single forwarded clock transmitter and injection-locked VCO (IL-VCO) are shared for each die. The interconnect topology consists of two packaged dies connected to a bridge board through top-side package connectors. This topology is compatible with either a high-density interconnect (HDI) or Flex cable, but the HDI implementation is the focus of this work. The data signals for each bundle are routed on a single layer of the bridge, and all lanes within a bundle are length matched to <100µm. This allows clock recovery to be done on a per-bundle basis. The package-to-bridge connector is a 500µm pitch LGA, which provides approximately 4X area density advantage over socket-to-PCB routing and facilitates length matching in the package breakout. The channel is continued on-die with length matched TLs that route the data and clock signals to centrally located TX and RX bundle circuitry ( Fig. 8.1.7). Each bundle occupies the area of only eight C4 bumps. The total area for active interface circuitry is 3.2mm 2 .Figure 8.1.2 shows the schematic for the TX portion of the interface. A supplyregulated IL-VCO generates the interface clock, emulating a system wherein multiple interfaces share a single PLL and filter the clock locally. Alternately, a per-interface PLL based...
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.