Adam S. Hartman scite author profile

Adam S. Hartman

4Publications

46Citation Statements Received

98Citation Statements Given

How they've been cited

How they cite others

105

Affiliations

Carnegie Mellon University

Publications

Order By: Most citations

Lifetime improvement through runtime wear-based task mapping

Hartman

Thomas

2012

View full text Add to dashboard Cite

As transistors continue to become smaller, they become exponentially susceptible to permanent wearout faults. Without mitigation, these types of faults will render systems useless within unacceptably short time periods. Our work presents the design for a runtime task mapping subsystem which mitigates these faults using a wear-based heuristic. We compare our wear-based heuristic to power-and temperature-based heuristics used within the same system framework. Using a wide range of synthetic and real-world benchmarks, we show that our wear-based heuristic is able to improve total system lifetime by an average of 7.1% over temperature-based heuristics. Additionally, we show that our wear-based heuristic can be used to drastically improve the time to the first component failure (TTFF) of a system. TTFF is a metric that is of interest to designers who wish to avoid the design and verification difficulties of systems which are expected to recover after a component failure. Our wearbased heuristic improves TTFF by an average of 14.6% over temperature-based heuristics across all of our benchmarks. Our observations lead us to conclude that runtime, wearbased task mapping must be incorporated into systems for which lifetime is a primary design goal.

show abstract

A case for lifetime-aware task mapping in embedded chip multiprocessors

Hartman

Thomas

Meyer

2010

View full text Add to dashboard Cite

Temperature-aware design is emerging as a popular approach to addressing a variety of challenges, including system lifetime. In the case of task mapping, temperature-aware approaches indeed improve lifetime due to lifetime's strong dependence on temperature. However, temperature-aware design neglects several important factors that also influence lifetime: (a) physical parameters such as supply voltage and current density, as well as (b) application and architecture characteristics that affect what failures are survivable. Only lifetime-aware task mapping can expose the relationship between physical parameters, component failure, and system lifetime, and therefore find lifetime-optimal mappings.To address this need, we have developed a new lifetime-aware task mapping technique based on ant colony optimization (ACO). Our technique produces task mappings resulting in lifetimes within 17.9% of the observed optimal results on average, outperforming a lifetime-agnostic task mapping approach by an average of 32.3%. We also observed that the lifetimes resulting from task mappings within 1% of the best maximum system temperature vary by an average of 20.1% while the lifetimes resulting from task mappings within 1% of the best average system temperature vary by an average of 32.6%. Our observations lead us to conclude that one cannot depend on temperature-aware task mapping when system lifetime is a design constraint, but one may depend on lifetime-aware task mapping when one or both of lifetime and temperature are design constraints.

show abstract

Cost-effective lifetime and yield optimization for NoC-based MPSoCs

Meyer

Hartman

Thomas

2014

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

As semiconductor manufacturing processes scale to smaller and smaller feature sizes, manufacturing fault and permanent component failure are challenging how systems are traditionally designed. Historically, a combination of careful process tuning and design rule specification has been sufficient to cost-effectively ensure that deterministic design practices eventually result in acceptable system yield and lifetime. However, as transistors and wires shrink, they are simultaneously becoming more prone to complete or parametric failure at manufacturing time as well as degradation and total breakdown in the field, resulting in systems that are increasingly expensive to produce and less likely to function correctly for as long as intended. To address these growing challenges in system resilience, all systems-not only those intended for high-availability or mission-critical applications-must be designed with yield and lifetime in mind.This research is focused on the design-time system-level architectural optimization of cost, lifetime and yield in embedded network-on-chip-based multi-processor-systems-on-chip (NoC-based MPSoCs). At the system level, the precise nature and timing of a fault is irrelevant because the fault results in the (possibly temporary) loss of an entire processor, memory, or interconnect module regardless. One advantage of managing failure at the computer system level is therefore that once the location of a failure has been identified, the cause can be abstracted away. In this case, failures of different types may be treated the same and addressed using the same techniques. Based on this observation, we employ system-level slack -excess capacity in processor and memory nodes available to accommodate additional tasks in the event that other processors or memories are lost-as a general technique for mitigating MPSoC failure in the presence of either component manufacturing defects or permanent component failures.Given an application and fixed NoC-based communication architecture, our goal is to cost-effectively perform slack allocation, distributing execution and storage slack such that with high probability when manufacturing defects or permanent component failure occurs, sufficient resources remain for the system to continue to operate. The design space for slack allocation is large and complex. The design space consists of every possible slack allocation (up to n m for a system with n components and m possible alternatives in the component library). Furthermore, evaluating the lifetime of any single design is computationally expensive, requiring performance, power, and temperature evaluation for every possible combination of component failures. In one example we considered, an MPEG-4 decoder with 21 processors, 5 memories and 10 switches, there are 1.6 billion possible slack allocations alone (given a fixed communication architecture) and each system lifetime evaluation took from 46.4 to 144.5 seconds.To address the complexity of slack allocation, we have developed Critical Quantity Slack Alloca...

show abstract

Lifetime-Aware Task Mapping for Embedded Chip Multiprocessors

Hartman¹

2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adam S. Hartman

Lifetime improvement through runtime wear-based task mapping

A case for lifetime-aware task mapping in embedded chip multiprocessors

Cost-effective lifetime and yield optimization for NoC-based MPSoCs

Lifetime-Aware Task Mapping for Embedded Chip Multiprocessors

Contact Info

Product

Resources

About