On the Latency and Energy of Checkpointed Superscalar Register Alias Tables

Safi, Elham; Moshovos, Andreas; Veneris, Andreas

doi:10.1109/tvlsi.2008.2012128

Cited by 6 publications

(11 citation statements)

References 30 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We did not evaluate the energy consumption of the comparators for dependency checking on register renaming [11], [13], but each comparator consists of a few transistors, and thus, we considered their energy consumption to be comparatively small. Moreover, our proposed method omits this dependency checking on an RTC hit, which results in a conservative energy comparison for our technique.…”

Section: Evaluation Environmentmentioning

confidence: 99%

“…An RMT generally requires one write and three read ports per single 2-source operand instruction [11], [13]: 1) one write port for updating new destination mapping, 2) two read port for reading source operand mapping, and 3) one read port for reading old destination mapping. As a result, each instruction with 2-source operand requires four ports.…”

Section: Evaluated Modelsmentioning

confidence: 99%

“…An RMT occupies a considerably large circuit area. For general ISAs with a three-operand format, an RMT requires four ports per instruction [11], [13]. For example, in † There is a method with a CAM-based RMT [5], and our proposal also can be implemented on a CAM-based method.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

Shioya

Ando

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYOut-of-order superscalar processors rename register numbers to remove false dependencies between instructions. A renaming logic for register renaming is a high-cost module in a superscalar processor, and it consumes considerable energy. A renamed trace cache (RTC) was proposed for reducing the energy consumption of a renaming logic. An RTC caches and reuses renamed operands, and thus, register renaming can be omitted on RTC hits. However, conventional RTCs suffer from several performance, energy consumption, and hardware overhead problems. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands that are short distance from producers outside traces, and solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead as compared to a conventional processor. key words: superscalar processor, register renaming, trace cache, energy efficiency

show abstract

Section: Evaluation Environmentmentioning

confidence: 99%

Section: Evaluated Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

Shioya

Ando

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…For general ISAs with a three-operand format, an RMT requires 4 ports per instruction [17], [24]. For example, in a processor whose decode width is 8, such as IBM Power 8 [7], [2], a straightforward implementation of an RMT requires 32 ports 2 .…”

Section: A Problems Of Register Mapmentioning

confidence: 99%

“…We evaluated the energy consumption for register renaming by evaluating the arrays and their peripheral circuits related to our proposal, which are the RMT, TC, RTC and instruction cache. We do not evaluate the energy consumption of the comparators for dependency checking on register renaming [17], [24], but each comparator consists of few transistors, and thus, we think that their energy consumption is comparatively small. Moreover, our proposal omits this dependency checking on RTC hit, and consequently, this results in a conservative energy comparison for our technique.…”

Section: A Evaluation Environmentmentioning

confidence: 99%

Energy efficiency improvement of renamed trace cache through the reduction of dependent path length

Shioya

Ando

2014

2014 IEEE 32nd International Conference on Computer Design (ICCD)

View full text Add to dashboard Cite

A renaming logic is a high-cost module in a superscalar processor, and it consumes significant energy. For mitigating this, renamed trace cache (RTC), which caches renamed operands, was proposed. However, conventional RTCs have several problems such as low capacity-efficiency, large hardware overhead and insufficient caching of renamed operands. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands whose distances from producers outside traces are short, and it solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead compared to a conventional processor. I. INTRODUCTIONIn out-of-order superscalar processors, logical registers are renamed for removing false dependencies between instructions. Register numbers are renamed by accessing a table called register map table (RMT). The circuit area of an RMT has recently increased owing to the widespread use of SMT and other reasons described in Section II. For example, Alpha 21464 has a rename logic that is larger than its 64KB L1-data cache [14]. Consequently, it makes its complexity, energy consumption, and heat generation, a considerably serious issue. As a result, the RMT in the Intel P6 architecture consumes 4% of the energy consumed by the processor, which is comparable to that of its reservation station[9], and its power-density is the fourth highest on-chip [11].To overcome these problems, Vajapeyam et al. proposed a renamed trace cache (RTC) [22] that extends a trace cache (TC) [15]. A conventional TC caches instructions ordered by a dynamic execution sequence beyond branch instructions. An RTC is a TC that caches instructions with renamed operands. When instructions with renamed operands are obtained from an RTC, register renaming can be omitted, thus it is possible to reduce the number of ports of an RMT. An RMT generally comprises a RAM, and the area of a RAM grows proportionally with the square of the number of ports. Consequently, the reduction in the number of ports makes the area of the RMT very small. However, the RTC method has a limitation on caching renamed operands, i.e., the RTC method can cache renamed operands only when operands refer producers, which are dependent instructions of the operands, in the same trace. As a result, its advantage is limited, and the RTC method can cache renamed operands only for approximately 30% operands (Section VI).On the other hand, Ichibayashi et al. independently proposed another RTC method that can cache renamed operands that refer producers outside a trace[4]. For simplicity, we refer to the RTC of Vajapeyam et al. as local renamed trace cache(LRTC) and that of Ichibayashi et al. as global renamed

show abstract

A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors

Safi

Moshovos

Veneris

2009

2009 International Symposium on Systems, Architectures, Modeling, and Simulation

Self Cite

View full text Add to dashboard Cite

This work studies physical-level characteristics of the recently proposed compacted matrix instruction scheduler for dynamically-scheduled, superscalar processors. Previous work focused on the matrix scheduler's architecture and argued in support of its speed and scalability advantages. However, no physical-level implementation or models were reported for it. Using full-custom layouts in a commercial 90 nm fabrication technology, this work investigates the latency and energy variations of the compacted matrix and its accompanying logic as a function of the issue width, the window size, and the number of global recovery checkpoints. This work also proposes an energy optimization that throttles unnecessary pre-charges and evaluations. This optimization reduces energy by 10% and 18% depending on the scheduler size.

show abstract

On the Latency and Energy of Checkpointed Superscalar Register Alias Tables

Cited by 6 publications

References 30 publications

Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

Energy efficiency improvement of renamed trace cache through the reduction of dependent path length

A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors

Contact Info

Product

Resources

About