IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings.
DOI: 10.1109/iccd.2004.1347928
|View full text |Cite
|
Sign up to set email alerts
|

Best of both latency and throughput

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
65
0

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 91 publications
(69 citation statements)
references
References 27 publications
0
65
0
Order By: Relevance
“…On the other hand, parallel phases can be executed on numerous processors in parallel. Therefore, the lowest execution time for the parallel phases is achieved by executing them on many simple processors that consume less energy per instruction (EPI) [7]. We claim that a choice of symmetric cores is suboptimal due to the contradicting requirements of the serial and parallel phases within the same application.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, parallel phases can be executed on numerous processors in parallel. Therefore, the lowest execution time for the parallel phases is achieved by executing them on many simple processors that consume less energy per instruction (EPI) [7]. We claim that a choice of symmetric cores is suboptimal due to the contradicting requirements of the serial and parallel phases within the same application.…”
mentioning
confidence: 99%
“…Kumar et al [9] have shown how a heterogeneous multiprocessor could achieve similar performance to a homogeneous multiprocessor for less power and area. Grochowski et al [2], [7] have proposed and demonstrated an asymmetric multiprocessor by employing voltage and frequency scaling on a symmetric multiprocessor. Menasce et al [10] have shown the analytic benefit of heterogeneous systems using queuing models.…”
mentioning
confidence: 99%
“…This transition between LQS to GQS is related to tradeoffs between the signaling network's latency, or speed of activation, and its throughput, or the total spatial range over which all the components of the system communicate [32]. Communities in the LQS regime have a reduced time to activation, but are restricted to shortrange communication.…”
Section: Fig 4: (A)mentioning
confidence: 99%
“…Past works have also used atomicity in other ways to (i) simplify the core microarchitecture, (ii) enable better scalability and/or performance, or (iii) enable optimizations or other code transformations. Heterogeneous cores: Several works have examined the use of either multiple heterogeneous cores [4,56,22,20,7,12], one core with multiple heterogeneous backends [35], or a core with variable parameters [5,22] in order to adapt to the running application at a coarse granularity for better energy efficiency. We quantitatively compared to a coarse-grained heterogeneous approach [35] in §5 and showed that although coarse-grained designs can achieve good energy-efficiency, HBA does better by exploiting much finer-grained heterogeneity.…”
Section: Related Workmentioning
confidence: 99%
“…To exploit this diversity, past works proposed core-level heterogeneity. These heterogeneous designs either combine multiple separate cores (e.g., [29,22,3,20,53,7,12,26,4,56]), or else combine an inorder pipeline and out-of-order pipeline with a shared frontend in a single core [35]. Past works demonstrate energy-efficiency improvements with usually small impact to performance.…”
Section: Introductionmentioning
confidence: 99%