Technology scaling increases circuits' susceptibility to manufacturing imperfections and dramatically decreases processor yields. Traditional defect-tolerance approaches add explicit redundant circuitry to improve yield and hence are very expensive for datapath modules in processors. We propose a multi-layered methodology to develop new and efficient defect-tolerance approaches for processors. Specifically, we develop a microarchitecture layer approach for arithmetic logic units (ALU), a circuit layer approach for multipliers, and an ISA layer approach for floating-point units (FPU). We demonstrate that our three approaches improve performance-per-fabricated-die-area of a modern processor core by 3.5%, 2.4%, and at least 9%, and hence collectively provide significant gains.
INTRODUCTIONTechnology scaling has increased transistor density and reduced fabrication cost per transistor. To efficiently utilize these, processor designers pack more functionality and more performance enhancing features into each new processor. However, as technology advances deep into nano-scale, the improvements in cost, power, and delay, provided by each technology generation have started to slow down, or even reverse. One reason is increase in circuits' susceptibility to manufacturing imperfections. We use the term defect to refer to two types of imperfections, namely, process variations and random defects that affect circuit operation. Defect rates are now increasing with each scaling generation and hence reducing yield, especially at the top-levels of performance. Reduction in yield diminishes or may even negate reduction in die fabrication cost provided by scaling.Traditionally, use of explicit redundancy has been explored in memory and logic circuits to improve yield. Spare rows and columns are commonly used in memories [1] [2]. However, cost of such explicit circuit redundancy is prohibitively high for logic circuits, such as most datapath modules. Recent advanced defecttolerance (DT) approaches for processors use implicit redundancy, i.e., utilize existing features in processors and add a minimal level of reconfiguration to achieve DT with low overheads. For instance, many have exploited microarchitecture property of caches to tolerate defects [3] [4] [5] [6]. The fact that branch predictors are speculative has also been studied for the purpose of DT in [7]. However, there is no systematic methodology to guide the development of such advanced approaches and there is no study on using implicit redundancy for datapath modules.In this work, we propose a methodology to develop implicitredundancy DT approaches for processors. The methodology systematically develops approaches to maximize fabrication efficiency, which is measured as performance-per-fabricateddie-area (or performance-per-area). In particular, we propose new and efficient approaches for datapath modules used in processors, namely, ALU, fixed-point multiplier, and FPU. In this work, we use a single core processor to demonstrate that our three approaches collectively provid...