2015
DOI: 10.1109/tc.2014.2378291
|View full text |Cite
|
Sign up to set email alerts
|

An Energy-Efficient Last-Level Cache Architecture for Process Variation-Tolerant 3D Microprocessors

Abstract: As process technologies evolves, tackling process variation problems is becoming more challenging in 3D (i.e., diestacked) microprocessors. Process variation adversely affects performance, power, and reliability of the 3D microprocessors, which in turn results in yield losses. In particular, last-level caches (LLCs: L2 or L3 caches) are known as the most vulnerable component to process variation in 3D microprocessors. In this paper, we propose a novel cache architecture that exploits narrowwidth values for yie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…WBD disables the faulty cache lines (blocks). Please note that this technique is widely used, simple, practical, and also introduced in [8] (similar to Intel Pellston Technology), [9], and [10] (introduced as a naïve way reduction scheme). Though this scheme could always lead to 100% cache yield by employing the adaptive cache bypassing (i.e., bypasses cache if there is no non-faulty blocks in a cache set), the effective cache capacity would be significantly reduced, eventually causing performance losses.…”
Section: Performancementioning
confidence: 99%
See 2 more Smart Citations
“…WBD disables the faulty cache lines (blocks). Please note that this technique is widely used, simple, practical, and also introduced in [8] (similar to Intel Pellston Technology), [9], and [10] (introduced as a naïve way reduction scheme). Though this scheme could always lead to 100% cache yield by employing the adaptive cache bypassing (i.e., bypasses cache if there is no non-faulty blocks in a cache set), the effective cache capacity would be significantly reduced, eventually causing performance losses.…”
Section: Performancementioning
confidence: 99%
“…For energy comparison, we further classify the WBD technique into the cases where the Gated-Vdd [23] is applied (WBD w/ Gated-Vdd) and not applied (WBD w/o Gated-Vdd). In the case of WBD w/ GVdd, the disabled cache block is powered off to reduce leakage power consumption (as in [9] and [10]) while the WBD w/o Gated-Vdd (WBD w/o GVdd) does not apply the Gated-Vdd. Compared to the ideal case (i.e., baseline), VL_base and VL_mig show energy overheads of only 7.5% and 7.3%, respectively.…”
Section: Performancementioning
confidence: 99%
See 1 more Smart Citation
“…Some researchers try to reduce energy consumption by applying the dynamic voltage and frequency scaling technique to manage the shared cache network [20], or proposing novel tree-based directory to bridge plenty of shared cache portions in 3D network [21]. The research in [22] proposed a novel narrow-width-value based stacked 3D cache architecture for both energy saving and yield improvement. The research in [23] tried to use thermal information in the shared cache to adaptively balance runtime status.…”
Section: Related Workmentioning
confidence: 99%
“…Although many recent researches discover that stacked architectures are greatly adapted in area saving, network interconnection and layout optimization [8,21], however, those architectures are limited in their ability to match locality distributions among applications, and to manage highly shared data efficiently as each application contains different behaviors on runtime system latency, performance and energy debit [29]. Moreover, situations are more critical in shared last level cache, because the shared cache should serve too many threads for many data sharing, resulting in serious efficiency and coherence problems [15,17,26]. For better management of the shared cache, partitioned cache methods [1,7,22] are proposed, which can allocate cache parts into several groups corresponding to each thread.…”
Section: Introductionmentioning
confidence: 99%