Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
The design of a high performance fetch architecture can be challenging due to poor interconnect scaling and energy concerns. Way prediction has been presented as one means of scaling the fetch engine to shorter cycle times, while providing energy efficient instruction cache accesses. However, way prediction requires additional complexity to handle mispredictions.In this paper, we examine a high-bandwidth fetch architecture augmented with an instruction cache way predictor. We compare the performance and energy efficiency of this architecture to both a serial access cache and a parallel access cache. Our results show that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity needed for way prediction.The performance of any architecture is limited by the amount of instruction fetch bandwidth that can be supplied to the execution core. Instruction cache performance is a vital part of achieving high fetch bandwidth. An energy efficient fetch design that still achieves high performance is also important because overall chip energy consumption may limit not only what can be integrated onto a chip, but also how fast the chip can be clocked [7]. Brooks et al. [1] report that instruction fetch and the branch target buffer are responsible for 22.2% and 4.7% respectively of power consumed by the Intel Pentium Pro.Brooks also reports that caches comprise 16.1% of the power consumed by the Alpha 21264. Montanaro et al. [6] found that the instruction cache consumes 27% of power in their StrongARM 110 processor.Set-associative cache designs can improve performance over a direct mapped cache by reducing thrashing among cache blocks that map to the same cache index (i.e. among all ways within a cache set). This extra associativity comes at the price of increased energy. During a parallel cache access, both the tag and data components of all cache ways (blocks) in a given cache set (index) must be driven. If the tag component of one of the ways matches the desired address, then the corresponding data component of that way is selected to be output. But regardless of which way matches the desired address, all ways in the set are driven on the bitlines of the cache to the logic that selects a single cache block to output.Way prediction [4,13,9] has been proposed as a means to provide low-latency, energy efficient cache access. Way prediction has been used in a number of real world architectures, including the Alpha 21264 [10], which makes use of the Next Line and Set (NLS) [3] predictor, a branch predictor with integrated way prediction. However, way prediction requires additional hardware to perform the actual way prediction, verify the correctness of a prediction, and recover in the event of a misprediction.In this paper, we compare the performance of using way prediction [4,13,9,10,3] to using a serial Decoder Data Array Data Output Col mux & sense amps Way 1 Way 0 Way 0 Decoder Data Array Way 1 Tag Array 16...
The design of a high performance fetch architecture can be challenging due to poor interconnect scaling and energy concerns. Way prediction has been presented as one means of scaling the fetch engine to shorter cycle times, while providing energy efficient instruction cache accesses. However, way prediction requires additional complexity to handle mispredictions.In this paper, we examine a high-bandwidth fetch architecture augmented with an instruction cache way predictor. We compare the performance and energy efficiency of this architecture to both a serial access cache and a parallel access cache. Our results show that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity needed for way prediction.The performance of any architecture is limited by the amount of instruction fetch bandwidth that can be supplied to the execution core. Instruction cache performance is a vital part of achieving high fetch bandwidth. An energy efficient fetch design that still achieves high performance is also important because overall chip energy consumption may limit not only what can be integrated onto a chip, but also how fast the chip can be clocked [7]. Brooks et al. [1] report that instruction fetch and the branch target buffer are responsible for 22.2% and 4.7% respectively of power consumed by the Intel Pentium Pro.Brooks also reports that caches comprise 16.1% of the power consumed by the Alpha 21264. Montanaro et al. [6] found that the instruction cache consumes 27% of power in their StrongARM 110 processor.Set-associative cache designs can improve performance over a direct mapped cache by reducing thrashing among cache blocks that map to the same cache index (i.e. among all ways within a cache set). This extra associativity comes at the price of increased energy. During a parallel cache access, both the tag and data components of all cache ways (blocks) in a given cache set (index) must be driven. If the tag component of one of the ways matches the desired address, then the corresponding data component of that way is selected to be output. But regardless of which way matches the desired address, all ways in the set are driven on the bitlines of the cache to the logic that selects a single cache block to output.Way prediction [4,13,9] has been proposed as a means to provide low-latency, energy efficient cache access. Way prediction has been used in a number of real world architectures, including the Alpha 21264 [10], which makes use of the Next Line and Set (NLS) [3] predictor, a branch predictor with integrated way prediction. However, way prediction requires additional hardware to perform the actual way prediction, verify the correctness of a prediction, and recover in the event of a misprediction.In this paper, we compare the performance of using way prediction [4,13,9,10,3] to using a serial Decoder Data Array Data Output Col mux & sense amps Way 1 Way 0 Way 0 Decoder Data Array Way 1 Tag Array 16...
A new architecture is proposed to realize 3-0 graphics rendering for embedded multimedia system. Because only 20% to 83% triangles in original 3-0 object models are visible by simulation, our architecture is designed to eliminate the redundant operations on invisible triangles without image qualiiy loss. It bases on our index rendering and enhanced deferred lighting approaches, and its feature is dual pipeline rendering architecture. The simulation and analysis results show that this architecture can save up to 63.4% CPU operations compared with traditional architectures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.