Reliability and process variability are serious issues for FPGAs in the future. Fortunately FPGAs have the ability to reconfigure in the field and at runtime, thus providing opportunities to overcome some of these issues. This paper provides the first comprehensive survey of fault detection methods and fault tolerance schemes specifically for FPGAs, with the goal of laying a strong foundation for future research in this field. All methods and schemes are qualitatively compared and some particularly promising approaches highlighted.
Abstract-In a modern FPGA system-on-chip design, it is often insufficient to simply assess the total power consumption of the entire circuit by design-time estimation or runtime power rail measurement. Instead, to make better runtime decisions, it is desirable to understand the power consumed by each individual module in the system. In this work, we combine boardlevel power measurements with register-level activity counting to build an online model that produces a breakdown of power consumption within the design. Online model refinement avoids the need for a time-consuming characterisation stage and also allows the model to track long-term changes to operating conditions. Our flow is named KAPow, a (loose) acronym for 'K'ounting Activity for Power estimation, which we show to be accurate, with per-module power estimates as close to ±5mW of true measurements, and to have low overheads. We also demonstrate an application example in which a permodule power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by over 8%.
The operation of FPGA systems, like most VLSI technology, is traditionally governed by static timing analysis, whereby safety margins for operating and manufacturing uncertainty are factored in at design-time. If we operate FPGA designs beyond these conservative margins we can obtain substantial energy and performance improvements. However, doing this carelessly would cause unacceptable impacts to reliability, lifespan and yield -issues which are growing more severe with continuing process scaling.Fortunately, the flexibility of FPGA architecture allows us to monitor and control reliability problems with a variety of runtime instrumentation and adaptation techniques. In this paper we develop a system for detecting timing faults in arbitrary FPGA circuits based on Razor-like shadow register insertion. Through a combination of calibration, timing constraint and adaptation of the CAD flow, we deliver low-overhead, trustworthy fault detection for FPGA-based circuits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.