A Power-Aware Approach for Online Test Scheduling in Many-Core Architectures

Haghbayan, Mohammad-Hashem; Rahmani, Amir-Mohammad; Miele, Antonio; Fattah, Mohammad; Plosila, Juha; Liljeberg, Pasi; Tenhunen, Hannu

doi:10.1109/tc.2015.2481411

Cited by 13 publications

(12 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Skitsas et al (2018)) or selectively based on a priority rule (e.g. Skitsas et al (2016); Haghbayan et al (2016b)).…”

Section: Fault Diagnosismentioning

confidence: 99%

“…To overcome this limitation, in the second class of approaches cores are selectively tested thus exploiting an ad-hoc scheduling policy to limit the impact on the system performance. Haghbayan et al (2016b) proposed an SBST routine scheduling algorithm for many-core architectures aimed at testing cores on the basis of the suffered stress, in terms of an aging metric, during the execution of the nominal workload. The scheduling policy is aimed at avoiding any degradation in system performance while not violating the power budget of the many-core system.…”

Section: Fault Diagnosismentioning

confidence: 99%

See 1 more Smart Citation

On-Chip Dynamic Resource Management

Miele

Kanduri²,

Moazzemi³

et al. 2019

FNT in Electronic Design Automation

Self Cite

View full text Add to dashboard Cite

The need for dynamic resource management has shadowed the exponential growth of on-chip transistor capacity, and the challenge is accentuated by the heterogeneity of resources, and the bewildering variety of constraints and requirements of applications, platforms and users. The field has started with a few research papers in the early 1990s but has grown today to over hundred yearly publications, leading to an accumulated body of literature presumable far above 1000 papers. We focus on the dynamic (run-time) management of onchip resources and mostly ignore design-time techniques and off-chip resources of larger electronic systems. Moreover, we do not attempt a complete review of all published work on the topic. Rather, this survey provides a structured review and discussion of the state of the art and is divided along the primary objectives of resource management techniques: performance, power, reliability and quality of service, each

show abstract

“…Skitsas et al (2018)) or selectively based on a priority rule (e.g. Skitsas et al (2016); Haghbayan et al (2016b)).…”

Section: Fault Diagnosismentioning

confidence: 99%

Section: Fault Diagnosismentioning

confidence: 99%

On-Chip Dynamic Resource Management

Miele

Kanduri²,

Moazzemi³

et al. 2019

FNT in Electronic Design Automation

Self Cite

View full text Add to dashboard Cite

show abstract

“…both transient and permanent faults, while other strategies (e.g. [88], [89]) schedule at runtime software-based self testing routines to identify permanent damages.…”

Section: Reliabilitymentioning

confidence: 99%

Trends in On-chip Dynamic Resource Management

Moazzemi¹,

Kanduri²,

Juhász³

et al. 2018

2018 21st Euromicro Conference on Digital System Design (DSD)

Self Cite

View full text Add to dashboard Cite

The Complexity of emerging multi/many-core architectures and diversity of modern workloads demands coordinated dynamic resource management methods. We introduce a classification for these methods capturing the utilized resources and metrics. In this work, we use this classification to survey the key efforts in dynamic resource management.We first cover heuristic and optimization methods used to manage resources such as power, energy, temperature, Qualityof-Service (QoS) and reliability of the system. We then identify some of the machine learning based methods used in tuning architectural parameters in computer systems. In many cases, resource managers need to enforce design constraints during runtime with a certain level of guarantee. Hence, we also study the trend in deploying formal control theoretic approaches in order to achieve efficient and robust dynamic resource management.

show abstract

“…This method implies that the entire system will be offline during the duration of the test process, thereby interrupting the execution of other applications. Another approach is to initiate testing on individual cores that have been observed to be idle for some time [25] [26] [27]. Thus, the testing process in minimally intrusive, but the time required to complete the testing of all cores is substantially longer (since each core is individually tested at different points in time).…”

Section: Introductionmentioning

confidence: 99%

Exploring System Availability During Software-Based Self-Testing of Multi-core CPUs

2018

View full text Add to dashboard Cite

As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, firstly we investigate the relation between system test latency and test-time overhead in multi-/many-core systems with shared Last-Level Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. Secondly, we propose a new methodology aiming to reduce the extra overhead related to testing that is incurred as the system scales up (i.e., the number of on-chip cores increases). The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system test session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relationship between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for the actual workloads. Under given system test latency constraints, which dictate the recovery time in the event

show abstract

A Power-Aware Approach for Online Test Scheduling in Many-Core Architectures

Cited by 13 publications

References 40 publications

On-Chip Dynamic Resource Management

On-Chip Dynamic Resource Management

Trends in On-chip Dynamic Resource Management

Exploring System Availability During Software-Based Self-Testing of Multi-core CPUs

Contact Info

Product

Resources

About