Proceedings of the 2007 International Symposium on Low Power Electronics and Design 2007
DOI: 10.1145/1283780.1283863
|View full text |Cite
|
Sign up to set email alerts
|

On the latency, energy and area of checkpointed, superscalar register alias tables

Abstract: We present two full-custom implementations of the Register Alias Table (RAT) for a 4-way superscalar dynamically-scheduled processor in a commercial 130nm CMOS technology. The implementations differ in the way they organize the embedded global checkpoints (GCs) which support speculative execution. In the first implementation, representative of early designs, the GCs are organized as shift registers. In the second implementation, representative of more recent proposals, the GCs are organized as random access bu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…We assume that a checkpoint is available at the youngest branch from which the core traverses the ROB updating the RAT at a pace of 4 instructions per cycle until reaching the youngest instruction in the ROB. This assumes a RAM-based RAT [30]; a CAM-based RAT [31] would incur even lower latency overhead since the RAT could be checkpointed on every load instruction. Our simulation experiments report that rebuilding the RAT takes 1.8 cycles on average (and at most 2.9 cycles, for graph500).…”
Section: Methodsmentioning
confidence: 99%
“…We assume that a checkpoint is available at the youngest branch from which the core traverses the ROB updating the RAT at a pace of 4 instructions per cycle until reaching the youngest instruction in the ROB. This assumes a RAM-based RAT [30]; a CAM-based RAT [31] would incur even lower latency overhead since the RAT could be checkpointed on every load instruction. Our simulation experiments report that rebuilding the RAT takes 1.8 cycles on average (and at most 2.9 cycles, for graph500).…”
Section: Methodsmentioning
confidence: 99%
“…However, several effects favor a moderate number of checkpoints. Obviously, a large number of checkpoints increases the latency, area, and power consumption of the register map-table [29]. But checkpoints also lock down registers and checkpointing too frequently restricts the aggressiveness with which CPR can reclaim inter-checkpoint registers.…”
Section: Background: Cprmentioning
confidence: 99%
“…Our processor does not implement an architectural rename table. A group of RAT checkpoints and a ROB walking logic are used to perform mispeculation recovery, including branch mispredictions [164].…”
Section: Reorder Buffermentioning
confidence: 99%