Designing a 3 GHz, 130 nm, Intel/sup /spl reg// Pentium/sup /spl reg// 4 processor

Deleganes, D.J.; Douglas, Jonathan; Kommandur, B.; Patyra, M.J.

doi:10.1109/vlsic.2002.1015065

Cited by 23 publications

(8 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, we focus on two evaluation points: worst-case leakage (all fast low-V t devices at 0.9V), resulting in 15% of total power, and more realistic leakage with a 50%/50% mix of low-V t and high-V t devices, resulting in 10% of total power. A real design [14] might use both types by optimizing critical path logic with fast transistors while reducing power in noncritical logic with less leaky transistors. Our main analysis assumes 10% leakage but we also summarize key results for 15% leakage in §5.1.…”

Section: Methodsmentioning

confidence: 99%

The heterogeneous block architecture

Fallin¹,

Wilkerson

Mutlu³

2014

2014 IEEE 32nd International Conference on Computer Design (ICCD)

View full text Add to dashboard Cite

This paper makes two new observations that lead to a new heterogeneous core design. First, we observe that most serial code exhibits fine-grained heterogeneity: at the scale of tens or hundreds of instructions, regions of code fit different microarchitectures better (at the same point or at different points in time). Second, we observe that by grouping contiguous regions of instructions into blocks that are executed atomically, a core can exploit this heterogeneity: atomicity allows each block to be executed independently on its own execution backend that fits its characteristics best.Based on these observations, we propose a fine-grained heterogeneous design that combines heterogeneous execution backends into one core. Our core design, the heterogeneous block architecture (HBA), breaks the program into blocks of code, determines the best backend for each block, and specializes the block for that backend. As an initial, concrete design, we combine out-of-order, VLIW, and in-order backends, using simple heuristics to choose backends. We compare HBA to multiple baseline core designs (including monolithic out-of-order, clustered out-of-order, in-order and a state-of-the-art heterogeneous core design) and show that HBA can provide significantly better energy efficiency than all designs at similar performance. Averaged across 184 traces from a wide variety of workloads, HBA reduces core power by 36.4% and energy per instruction by 31.9% compared to a 4-wide out-of-order core. We conclude that HBA provides a flexible substrate for exploiting fine-grained heterogeneity, enabling new energy-performance tradeoff points in core design.

show abstract

Section: Methodsmentioning

confidence: 99%

The heterogeneous block architecture

Fallin¹,

Wilkerson

Mutlu³

2014

2014 IEEE 32nd International Conference on Computer Design (ICCD)

View full text Add to dashboard Cite

show abstract

“…The comparison results are listed in Table 2. The Intel Pentium-4 [5] represents a standard general-purpose microprocessor. StrongArm SA-1100 [10] can be considered as general-purpose processor for mobile devices.…”

Section: Methodsmentioning

confidence: 99%

“…The main reason for this is high cycle count, which requires high clock frequency to achieve required throughput. The Intel Pentium-4 [5] represents a standard generalpurpose microprocessor. StrongArm SA-1100 [10] can be considered as general-purpose processor for mobile devices as it employs custom circuits, clock gating, and reduced supply voltage.…”

Section: Related Workmentioning

confidence: 99%

Low-Power Application-Specific Processor for FFT Computations

Pitkänen

Takala

2010

J Sign Process Syst

View full text Add to dashboard Cite

show abstract

“…However, in-place computations cannot be used and the processor has eight memory ports while the FFTTA uses only two. The Intel Pentium-4 [20] is a standard general-purpose microprocessor. Rest of the processors are dedicated for the FFT.…”

Section: Performance Analysismentioning

confidence: 99%

Low-Power, High-Performance TTA Processor for 1024-Point Fast Fourier Transform

Pitkänen

Makinen

Heikkinen

et al. 2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Transport Triggered Architecture (TTA) offers a cost-effective tradeoff between the size and performance of ASICs and the programmability of general-purpose processors. This paper presents a study where a high performance, low power TTA processor was customized for a 1024-point complexvalued fast Fourier transform (FFT). The proposed processor consumes only 1.55 µJ of energy for a 1024-point FFT. Compared to other reported FFT implementations with reasonable performance, the proposed design shows a significant improvement in energy-efficiency.

show abstract

Designing a 3 GHz, 130 nm, Intel/sup /spl reg// Pentium/sup /spl reg// 4 processor

Cited by 23 publications

References 2 publications

The heterogeneous block architecture

The heterogeneous block architecture

Low-Power Application-Specific Processor for FFT Computations

Low-Power, High-Performance TTA Processor for 1024-Point Fast Fourier Transform

Contact Info

Product

Resources

About