Intel, Fort Collins, COThe 700mm 2 65nm Itanium ® processor codenamed Tukwila [1] integrates four cores and a system interface with six QuickPath ® interconnect channels and four memory interconnect channels. The large die, shown in Fig. 3.4.6, and high level of integration coupled with process variability present clock-system design challenges in the areas of power consumption and variability compensation that we discuss in this paper. Figure 3.4.1 shows the clock system, which is a cascaded-PLL architecture with an initial filter PLL that receives a 133MHz reference clock. This maiden PLL filters reference-clock jitter and outputs a 133MHz clock to 13 downstream PLLs. Each downstream PLL has a duty-cycle corrector that monitors and corrects the end-of-route duty cycle.The post-PLL clock distributions for the core, system interface, and memory/processor interconnect digital-logic domains follow a common architecture. A balanced-H-tree clock route provides a full-swing clock to the secondlevel clock buffers (SLCBs), which are regional buffers with adjustable delay and static duty-cycle adjustment. Regional active deskew (RAD) detects possible phase misalignment between SLCBs and adjusts the SLCB delay to correct this phase error. The clock vernier device (CVD) is a local buffer with scan-controlled delay for fine-grained performance and debug tuning. Finally, the gater creates a variety of clock waveforms for use by latches and can be disabled to save power [1,2].The CPU power microcontroller and memory/processor interconnect transmit-clock domains differ in their distribution. In the QuickPath-and memoryinterconnect links, a shallow clock distribution follows the interconnect PLL, delivering a clock to the transmit logic with minimal jitter. A mux prior to the memory interconnect PLLs allows selection of an external clock reference. The CPU power microcontroller uses the system interface PLL output, which is divided down to a lower frequency. The microcontroller uses a single SLCB, so no RAD system is necessary in its clock domain.Each PLL achieves a wide frequency range over the process space by implementing a self-biased architecture [3] followed by a programmable output divider. Self-biased PLL jitter is minimized when the VCO power and signal swing are maximized, corresponding to a maximum VCO control voltage. This means the VCO should operate near its maximum frequency limit (F MAX ) [4]. Because the desired PLL output frequency is often much lower than the VCO F MAX , the divider is used to optimize the VCO frequency. In manufacturing, each PLL is calibrated by forcing the VCO control voltage from an on-die DAC and reading an on-die frequency counter. Fuses then select the optimal divisor (1, 2 or 4) for each clock ratio, N, assuring that the VCO will be running as close to its F MAX as possible. Figure 3.4.2a shows a calibrated VCO and associated divide settings across ratios. Figure 3.4.2b shows measured accumulated jitter curves from a part. The plateau jitter of the PLL improves from 20 to 6ps when inc...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.