50Current microprocessors are designed to execute instructions in parallel and out of order. In general, superscalar processors fetch instructions in order. After the branch prediction logic determines whether a branch is taken (or not) and its target address, the processor decodes the instructions and renames the register operands, removing name dependences introduced by the compiler. Because processors generally have more physical than logical registers, multiple instructions with the same logical destination can be in flight simultaneously. The renamed instructions then go into the issue queue where they wait until their operands are ready and their required resources are available. At the same time, instructions go into the reorder buffer, where they remain until they commit their results. When an instruction executes, the wakeup logic notifies dependent instructions that the corresponding operand is available. Finally, instructions commit their results in program order.This article focuses on the design of the logic that stores the instructions waiting for execution, as well as the logic associated with identifying whether operands are ready and selecting the instructions that start execution every cycle. All these components are part of the issue logic. Issue logic is one of the most complex parts of superscalar processors, one of the largest consumers of energy, and one of the main sites of power density. Its design is therefore critical for performance.Researchers have used a variety of schemes to implement the issue queue. In particular, several recent proposals have attempted to reduce the issue logic's complexity and power. To the best of our knowledge, this article is the first attempt to perform a comprehensive and thorough survey of the issue logic design space.
Basic CAM-based approachesOne of the most common ways to implement the issue logic is based on contentaddressable memory (CAM) and RAM array structures. These structures can store several instructions, but generally fewer than the total number of in-flight instructions. Each entry contains an instruction that has not been issued or has been issued speculatively but not yet validated and thus might need to be reexecuted.In general, entries use RAM cells to store operations, destination operands, and flags indicating whether source operands are ready while CAM cells store source operand identifiers-referred to here as tags. Overall, the issue logic's main source of complexity and power dissipation is the many tag comparisons it must perform every cycle. Researchers have proposed several approaches to improve the issue logic's power efficiency. We classify these approaches into two groups:• static approaches, which use fixed structures, and • dynamic approaches, which dynamically adapt some structures according to the properties of the executed code.Orthogonally, researchers have proposed several more efficient circuit designs, but they don't reduce the inherent complexity.
Dynamic approachesOne approach to reducing the power dissipation is b...