Neon

McCormack, Joel; McNamara, Robert; Gianos, C.; Seiler, Larry; Jouppi, Norman P.; Correll, Ken

doi:10.1145/285305.285320

Cited by 22 publications

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Using a serial cache for energy efficient instruction fetching

Reinman¹,

Calder²

2004

Journal of Systems Architecture

View full text Add to dashboard Cite

The design of a high performance fetch architecture can be challenging due to poor interconnect scaling and energy concerns. Way prediction has been presented as one means of scaling the fetch engine to shorter cycle times, while providing energy efficient instruction cache accesses. However, way prediction requires additional complexity to handle mispredictions.In this paper, we examine a high-bandwidth fetch architecture augmented with an instruction cache way predictor. We compare the performance and energy efficiency of this architecture to both a serial access cache and a parallel access cache. Our results show that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity needed for way prediction.The performance of any architecture is limited by the amount of instruction fetch bandwidth that can be supplied to the execution core. Instruction cache performance is a vital part of achieving high fetch bandwidth. An energy efficient fetch design that still achieves high performance is also important because overall chip energy consumption may limit not only what can be integrated onto a chip, but also how fast the chip can be clocked [7]. Brooks et al. [1] report that instruction fetch and the branch target buffer are responsible for 22.2% and 4.7% respectively of power consumed by the Intel Pentium Pro.Brooks also reports that caches comprise 16.1% of the power consumed by the Alpha 21264. Montanaro et al. [6] found that the instruction cache consumes 27% of power in their StrongARM 110 processor.Set-associative cache designs can improve performance over a direct mapped cache by reducing thrashing among cache blocks that map to the same cache index (i.e. among all ways within a cache set). This extra associativity comes at the price of increased energy. During a parallel cache access, both the tag and data components of all cache ways (blocks) in a given cache set (index) must be driven. If the tag component of one of the ways matches the desired address, then the corresponding data component of that way is selected to be output. But regardless of which way matches the desired address, all ways in the set are driven on the bitlines of the cache to the logic that selects a single cache block to output.Way prediction [4,13,9] has been proposed as a means to provide low-latency, energy efficient cache access. Way prediction has been used in a number of real world architectures, including the Alpha 21264 [10], which makes use of the Next Line and Set (NLS) [3] predictor, a branch predictor with integrated way prediction. However, way prediction requires additional hardware to perform the actual way prediction, verify the correctness of a prediction, and recover in the event of a misprediction.In this paper, we compare the performance of using way prediction [4,13,9,10,3] to using a serial Decoder Data Array Data Output Col mux & sense amps Way 1 Way 0 Way 0 Decoder Data Array Way 1 Tag Array 16...

show abstract

Using a serial cache for energy efficient instruction fetching

Reinman¹,

Calder²

2004

Journal of Systems Architecture

View full text Add to dashboard Cite

show abstract

Efficient primitive traversal using adaptive linear edge function algorithms

Waller

Ewins

White

et al. 1999

Computers & Graphics

View full text Add to dashboard Cite

Computation-effective 3-D graphics rendering architecture for embedded multimedia system

Liang

Jen²

2000

IEEE Trans. Consumer Electron.

View full text Add to dashboard Cite

A new architecture is proposed to realize 3-0 graphics rendering for embedded multimedia system. Because only 20% to 83% triangles in original 3-0 object models are visible by simulation, our architecture is designed to eliminate the redundant operations on invisible triangles without image qualiiy loss. It bases on our index rendering and enhanced deferred lighting approaches, and its feature is dual pipeline rendering architecture. The simulation and analysis results show that this architecture can save up to 63.4% CPU operations compared with traditional architectures.

show abstract

Neon

Cited by 22 publications

References 13 publications

Using a serial cache for energy efficient instruction fetching

Using a serial cache for energy efficient instruction fetching

Efficient primitive traversal using adaptive linear edge function algorithms

Computation-effective 3-D graphics rendering architecture for embedded multimedia system

Contact Info

Product

Resources

About