Proceedings 29th Annual International Symposium on Computer Architecture
DOI: 10.1109/isca.2002.1003558
|View full text |Cite
|
Sign up to set email alerts
|

The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
73
0

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 121 publications
(75 citation statements)
references
References 8 publications
2
73
0
Order By: Relevance
“…A 16-bit addition, the main component of the RC's clock period, has been found to have a latency of 1 3 2 5 4 5 6 8 79 2 18.29 FO4 delays. This is in line with the cycle times of recent Intel processors, which have ranged from 12-20 FO4 delays [11]. However, future programmable processors are expected to include much less logic than current designs in each pipeline stage, leading to greater increases in clock rates than would be caused by technology scaling alone.…”
Section: Resultssupporting
confidence: 65%
See 1 more Smart Citation
“…A 16-bit addition, the main component of the RC's clock period, has been found to have a latency of 1 3 2 5 4 5 6 8 79 2 18.29 FO4 delays. This is in line with the cycle times of recent Intel processors, which have ranged from 12-20 FO4 delays [11]. However, future programmable processors are expected to include much less logic than current designs in each pipeline stage, leading to greater increases in clock rates than would be caused by technology scaling alone.…”
Section: Resultssupporting
confidence: 65%
“…Matching the predictions in the ITRS requires that the clock period of a programmable cluster decrease to 3.16 FO4 delays in 22nm processes. Results from [11] indicate that reducing clock periods to less than 6 to 8 FO4 delays per cycle hurts overall performance, leading us to believe that programmable processor clock rates will not scale at the rates predicted by the ITRS.…”
Section: Gatementioning
confidence: 99%
“…The cache bank dimensions enable the calculation of wire lengths between successive routers. Based on delays for B-wires (Table 1) and a latch overhead of 2 FO4 [17], we compute the delay for a link (and round up to the next cycle for a 5 GHz clock). The (uncontended) latency per router is assumed to be three cycles.…”
Section: Extensions To Cactimentioning
confidence: 99%
“…Our circuit parameters (Table 1) are chosen to represent a wide spectrum of CMOS technologies from recent-past (180nm) to near-future (70nm) technologies. The clock speeds are scaled proportionally to the gate delays and match the aggressive 8 fanout-of-four (FO4) delay for each technology [11]. Therefore, the cycle time stays the same relative to the gate delay and a single pipeline stage employs the same number of logic levels across technologies.…”
Section: Methodsmentioning
confidence: 99%