10th International Symposium on High Performance Computer Architecture (HPCA'04)
DOI: 10.1109/hpca.2004.10003
|View full text |Cite
|
Sign up to set email alerts
|

A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors

Abstract: Simultaneous Multithreading (SMT) is an architectural

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…The performance of BAT scheme is evaluated using Simplescalar3.0 tool set [4] on 10 SPEC2000 CPU benchmarks. Firstly, since the instruction fetch delay is the performance bottleneck of modern superscalar CPU [2], we study the data flow between Level-1 instruction cache (L1-icache) and instruction buffer unit. Similarly to [2], a Harvard architecture is adopted.…”
Section: Discussionmentioning
confidence: 99%
“…The performance of BAT scheme is evaluated using Simplescalar3.0 tool set [4] on 10 SPEC2000 CPU benchmarks. Firstly, since the instruction fetch delay is the performance bottleneck of modern superscalar CPU [2], we study the data flow between Level-1 instruction cache (L1-icache) and instruction buffer unit. Similarly to [2], a Harvard architecture is adopted.…”
Section: Discussionmentioning
confidence: 99%
“…This situation is more acute for fetch mechanisms that fetch instructions from up to two threads each cycle. If we use high performance fetch mechanisms like [8], that provides as high performance as previous ones, this situation is reduced.…”
Section: Related Workmentioning
confidence: 99%
“…A modification of the stream fetch engine for SMT processors was evaluated in HPCA'04 [16], showing relevant impact on the choice of the fetch policy to be used when fetching from a single thread on each cycle. A stream predictor capable of issuing multiple predictions per cycle was shown in ISHP'05 [17]. And a mechanism to store decoded instructions (like the Pentium4 trace cache) in the standard instruction cache was presented in PACT'06 [18] (archive version published in IEEE ToC'09 [19]).…”
Section: Related and Follow-up Workmentioning
confidence: 99%