As the semiconductor industry keeps evolving at the pace predicted by Moore's Law, computer system architects are facing increasing challenges from the three major design constraints: performance, power, and reliability. Thermal constraints from a reasonable cooling cost do not scale well as technology evolves. The dwindling scaling on threshold voltage leads to a slower pace of supply voltage scaling. These two effects lead to an increasing power density in current and future technology generations. Reliability has emerged as a primary design constraint due to the smaller feature size and generally lower supply voltage for electronic devices. Transient errors caused by high-energy particle strike and voltage noises are expected to increase significantly in frequency.Performance improvement becomes more challenging for future architectures with limitations set by the power and the resilience constraints.Integration of accelerators to create heterogeneous processors is becoming more common for both power and performance reasons. However, this adds one more dimension to the design space that is already complex due to technology variants, system organizations, application's variability, and so on. Therefore, high-level models are essential for system designers to explore the design space and make decisions in a timely manner. Additionally, the three design constraints compete with each other. For example, resilience-aware techniques, such as DMR and TMR, are expensive in terms of power and performance, low-power designs usually come with a price of lower speed.Consequently, it requires system designers to make trade-offs by considering all the three design constraints at the same time.To address these challenges, we (1) propose an analytical modeling framework called Lumos that is capable of modeling power and performance for heterogeneous architectures with hardware iv v accelerators. Then we (2) use Lumos to explore the design space composed of CPU cores and accelerators, revealing important scaling trend for future heterogeneous architectures. We further (3) propose a rapid modeling framework to characterize resilience across a range of applications in DSP, and image processing domains; And finally we (4) propose an integrated framework to optimize energy-efficiency by trading off design constraints of power, performance and resilience.
Acknowledgments