Three generations of Alpha microprocessors have been designed using a proven custom design methodology. The performance of these microprocessors was optimized by focusing on high-frequency design. The Alpha instruction set architecture facilitates high clock speed, and the chip organization for each generation was carefully chosen to meet critical paths. Digital has developed six generations of CMOS technology optimized for high-frequency design. Complex circuit styles were used extensively to meet aggressive cycle time goals. CAD tools were developed internally to support these designs. This paper discusses some of the technologies that have enabled Alpha microprocessors to achieve high performance.Index Terms-Alpha, CMOS digital integrated circuits, computer architecture, flip-flops, integrated circuit design, logic design, microprocessors.
A six-issue, four-fetch, out-of-order execution, 6OOMHz Alpha microprocessor achieves an estimated 40SpecInt95,60SpecFP95 and 1800MB/s on McCalpin Stream. The 16.7x18.8mmz die contains 15.2M transistors and dissipates an estimated 72W. It is in 2.0V, 6-metal, 0.35pm CMOS with CMP planarization (Table 1) [ll. The chip is in a 587-pin ceramic IPGA with 198 pins for VDD/ VSS that includes a CuW heat slug for low thermal resistance between die and detachable heat sink. An on-chip PLL performs frequency multiplication of a differential PECL reference and synchronizes I/O by phase-aligning a CPU clock to the reference. Figure 1 is a detailed floorplan of the chip. Figure 2 depicts a blockf pipeline diagram of major sections and functions.The instruction fetcher ( Figure 3) reads four instructions per cycle plus a next-address pointer from a 64kB, 2-way pseudo-set associative, virtual instruction cache. The next-address pointer predicts the address of the subsequent four instructions and indexes the cache in the next cycle. In parallel, a branch predictor resolves the prediction. It contains three tables: a PC-indexed prediction table, a path-indexed prediction table, and a pathindexed table that dynamically chooses one of the former two predictions, based on the success of previous predictions. Fetched instructions are dispatched to integedmemory (INT/ MEM) and floating point (FP) pipelines, issued and executed outof order and retired in order. During dispatch, register specifiers are renamed to eliminate false dependencies by two twelve-port register mappers that dynamically map the architectural registers into a pool of physical registers (80 integer and 72 FP). Resulting map state is retained in an array until the instruction retires. Pre-retire map state is used to generate alist of remaining free physical registers. Buffered map state is restored when the CPU is redirected following a branch mispredict or exception.Mapped instructions enter a 20-entry INTMEM or a 15-entry FP issue queue. The INTMEM queue arbiter identifies the 4 oldest data-ready instructions. They issue to the integer execution unit (EBOX) and are removed from the INTMEM queue. Similarly, the FP queue issues the 2 oldest data-ready instructions to the FP execution unit (FBOX) and removes them from the FP queue.The EBOX (Figure 4) is divided into two clusters, CLO and CL1; each cluster contains 2 independent execution pipelines surrounding an 80-entry register file. Coherency between the two register file copies is maintained by broadcasting results across intercluster buses. Each of the four pipelines executes and bypasses arithmetic and logical operations in one cycle. Bypassed results between clusters take an additional cycle. The upper pipelines handle branches and shifts; CLO contains a pipelined multimedia engine (3-cycle latency) and CL1 contains a pipelined multiplier (7-cycle latency). The lower pipelines handle displacement address calculations for memory operations. The FBOX contains 2 independent execution pipelines surrounding a 72-en...
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU chip is described. The chip is fabricated in a 0.75pm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation. The die size is 16.8 mm X 13.9 mm and contains 1.68M transistors. The chip includes separate 8-kilobyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types. It is designed to execute two instructions per cycle among scoreboarded integer, floatingpoint, address, and branch execution units. Power dissipation is 30 W at 200-MHz operation.
Test chips built in a 32nm bulk CMOS technology consisting of hardened and non-hardened sequential elements have been exposed to neutrons, protons, alpha-particles and heavy ions. The radiation robustness of two types of circuit-level soft error mitigation techniques has been tested: 1) SEUT (Single Event Upset Tolerant), an interlocked, redundant state technique, and 2) a novel hardening technique referred to as RCC (Reinforcing Charge Collection). This work summarizes the measured soft error rate benefits and design tradeoffs involved in the implemented hardening techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.