As technology scales down at an exponential rate, leakage power is fast becoming the dominant component of the total power budget. A large share of the total leakage power is dissipated in the cache hierarchy. To reduce cache leakage, individual cache lines can be kept in drowsy mode, a low voltage, low leakage state. Every cache access may then result in dynamic power consumption and performance penalties. A trade-off between the amount of leakage power saved on one hand, and the impact on dynamic power and performance on the other hand must be reached.To affect this trade-off, we introduce "slumberous caches" in which the power level of cache lines is controlled with the cache replacement policy. In a slumberous cache, cache lines are maintained at different power save modes which we call "tranquility levels", which depend on their order of replacement priorities.We evaluate the trade-offs in the context of PLRU, a common cache replacement algorithm. We explore various assignments of the tranquility levels to lines and compare overall power and performance impacts. As technology scales down, the dynamic power required to energize slumberous cache lines drops drastically while the leakage power savings remain roughly steady. The performance penalty--in cycles--remains constant with technology scaling for each scheme we evaluate.
-In this paper we present deterministic clock gating schemes for various micro architectural blocks of a modern out-of-order superscalar processor. We propose to make use of 1) idle stages of the pipelined function units (FUs) and 2) wrong-path instruction execution during branch mis-prediction, in order to clock gate various stages of FUs. The baseline Pipelined Functional unit Clock Gating (PFCG), presented for evaluation purpose only, disables the clock on idle stages and thus results in 13.93% chip-wide energy saving. Wrong-path instruction Clock Gating (WPCG) detects wrong-path instructions in the event of branch misprediction and prevents them from being issued to the FUs, and subsequently, disables the clock of these FUs along with reducing the stress on register file and cache. Simulations demonstrate that more than 92% of all wrong-path instructions can be detected and stopped from being executed. The WPCG architecture results in 16.26% chip-wide energy savings which is 2.33% more than that of the baseline PFCG scheme.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.