A renaming logic is a high-cost module in a superscalar processor, and it consumes significant energy. For mitigating this, renamed trace cache (RTC), which caches renamed operands, was proposed. However, conventional RTCs have several problems such as low capacity-efficiency, large hardware overhead and insufficient caching of renamed operands. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands whose distances from producers outside traces are short, and it solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead compared to a conventional processor.
I. INTRODUCTIONIn out-of-order superscalar processors, logical registers are renamed for removing false dependencies between instructions. Register numbers are renamed by accessing a table called register map table (RMT). The circuit area of an RMT has recently increased owing to the widespread use of SMT and other reasons described in Section II. For example, Alpha 21464 has a rename logic that is larger than its 64KB L1-data cache [14]. Consequently, it makes its complexity, energy consumption, and heat generation, a considerably serious issue. As a result, the RMT in the Intel P6 architecture consumes 4% of the energy consumed by the processor, which is comparable to that of its reservation station[9], and its power-density is the fourth highest on-chip [11].To overcome these problems, Vajapeyam et al. proposed a renamed trace cache (RTC) [22] that extends a trace cache (TC) [15]. A conventional TC caches instructions ordered by a dynamic execution sequence beyond branch instructions. An RTC is a TC that caches instructions with renamed operands. When instructions with renamed operands are obtained from an RTC, register renaming can be omitted, thus it is possible to reduce the number of ports of an RMT. An RMT generally comprises a RAM, and the area of a RAM grows proportionally with the square of the number of ports. Consequently, the reduction in the number of ports makes the area of the RMT very small. However, the RTC method has a limitation on caching renamed operands, i.e., the RTC method can cache renamed operands only when operands refer producers, which are dependent instructions of the operands, in the same trace. As a result, its advantage is limited, and the RTC method can cache renamed operands only for approximately 30% operands (Section VI).On the other hand, Ichibayashi et al. independently proposed another RTC method that can cache renamed operands that refer producers outside a trace[4]. For simplicity, we refer to the RTC of Vajapeyam et al. as local renamed trace cache(LRTC) and that of Ichibayashi et al. as global renamed