Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems 2009
DOI: 10.1145/1555349.1555369
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors

Abstract: Temperature-induced reliability issues are among the major challenges for multicore architectures. Thermal hot spots and thermal cycles combine to degrade reliability. This research presents new reliability-aware job scheduling and power management approaches for chip multiprocessors. Accurate evaluation of these policies requires a novel simulation framework that can capture architecture-level effects over tens of seconds or longer, while also capturing thermal interactions among cores resulting from dynamic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
63
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 89 publications
(64 citation statements)
references
References 29 publications
1
63
0
Order By: Relevance
“…We use several most common temperature induced intrinsic hard failure mechanisms, including electromigration, time dependent dielectric breakdown, thermal cycling and disk failure [48], [18], [45], to analyze the reliability degradation caused by a thermal attack. We use λ to represent the failure rate of each failure mechanism.…”
Section: E Damage Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…We use several most common temperature induced intrinsic hard failure mechanisms, including electromigration, time dependent dielectric breakdown, thermal cycling and disk failure [48], [18], [45], to analyze the reliability degradation caused by a thermal attack. We use λ to represent the failure rate of each failure mechanism.…”
Section: E Damage Analysismentioning
confidence: 99%
“…They reported that temperature is strongly correlated with disk failures. Coskun et al [18] studied the reliability of different job scheduling and power management methods. They proposed a fine grained technique to simulate the thermal behaviors and input the thermal trace to the reliability model.…”
Section: Related Workmentioning
confidence: 99%
“…There are a few recent approaches aimed at resource management for either power consumption and/or lifetime optimization in symmetric [1,5,11,4,14,6] or asymmetric [10,21,18] architectures. The authors in [4] propose an off-line technique to improve the lifetime reliability of MPSoCs, while [11] describes a combination of design-time and runtime techniques to optimize it.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…It is commonly agreed upon that online adaptation is essential when dealing with aging, temperature, or energy related optimization, due to the lack of significant information that could drive a design-time solution space exploration. Thus, when moving to on-line optimization, [1] is one of the first approaches considering both lifetime reliability and energy consumption in MPSoCs; however, the two dimensions are optimized separately. Energy and reliability optimization is considered in [5] as well: the proposed hybrid approach does not consider computational energy optimization and DVFS-enabled architectures.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…However, the DVFS technique reduces the temperature by sacrificing the computing system performance. Based on the DVFS technique, a lot of researches have been published [31,58,60]. They either tried to analyze the trade off between the performance and power and thermal dissipation, or focused on optimizing the thermal issue by reducing processor peak temperature.…”
Section: Dynamic Voltage and Frequency Scalingmentioning
confidence: 99%