A novel all-digital CDR for source-synchronous links, and its implementation in 90nm CMOS, is presented. A phase alignment technique with ping-pong action between two clock phases is used. The system is implemented in static CMOS logic, occupies 0.234 mm 2 and dissipates 16.6 mW at 6 Gb/s, demonstrating BER <10-13 with PRBS-7 input. The compactness and all-static-CMOS nature of the system make it suitable for use in high-speed I/Os requiring per-pin synchronization. (Keywords: CDR, static CMOS, all-digital) Introduction Most modern high-speed interconnect relies on both high data rates per pin and parallelism. Wide links tend to be source-synchronous; however, the delay between the clock and data paths can vary over time, making re-synchronization of the clock and data at the receiver necessary. As data rates increase, the mismatch between the data paths themselves has become large enough to require per-pin phase alignment [1]. Thus, a small, low-power CDR system is an important component of such interconnects.This paper presents a novel all-digital CDR system for source-synchronous links (Fig. 1). By taking a digital approach, this design avoids the increasing size, power and complexity overheads faced by analog techniques in highlyscaled CMOS processes. Except for the front-end senseamplifiers (StrongARM latches) the system is implemented entirely using static CMOS logic gates and the synchronization algorithm is synthesized from HDL into standard cells. Therefore, the design is highly portable and customizable, and its performance scales with the digital circuitry fed by the link. Finally, it collects data that can provide diagnostics for the link without extra hardware, useful for on-chip self-test and calibration.Principle of Operation The typical CDR uses a 2x oversampled data-clock/edgeclock technique and a PLL or a DLL. In this system, the edge clock is repurposed into a 'search-clock', not fixed at 90° relative to the data clock, but free to move within 2 unitintervals (UI). This 2 UI delay is generated by an 'open' delay line that is slowly and digitally calibrated. The samples produced by the search-clock are compared with those produced by the data-clock, generating match/mismatch (M/MM) data. As the search-clock sweeps through 2 UI, the M/MM information is collected into a 'signature', which can be thought of as a binary reduction of an eye diagram (Fig. 2). By filtering the raw signature, the system can identify the middle of the eye, where the search-clock will be positioned to recover the data at the end of the sweep. At this point the function of the search-and data-clocks is switched and a new sweep cycle, with the old data-clock now acting as the searchclock, starts. This ping-pong action overcomes the key limitation of traditional delay-line-based systems; allowing the data phase to swap from one UI to an adjacent one between updates enables the realization of an infinite delay range. The 2 UI delay can be calibrated by ensuring that the distance between the end of one 'eye opening' a...