Shane T. Fleming scite author profile

Modern computer systems are formed from many interacting systems and heterogeneous components, that face increasing constraints on performance, power consumption, and temperature. Such systems have complex run-time dynamics which cannot easily be predicted or modelled at design-time, creating a need for online dynamic systems management. The Heartbeats API is a popular open source project which provides a standardised way for applications to monitor and publish their progress in multi-core CPU systems, but it does not allow hardware components to be monitored or to observe the progress of other components of the system. This paper presents work which extends the capacities of the Heartbeats API across the whole system while maintaining backwards compatibility with the legacy software API. To demonstrate the framework's capabilities an Autonomous Underwater Vehicle (AUV) case study is explored, where a power-aware HW/SW image processing application is implemented on a reconfigurable SoC and an approximate energy saving of 30% is observed for an example input video. Current progress is also discussed on some applications which build upon the framework, including an CubeSat experiment for an Adaptive Heterogeneous FDIR system that will launch in 2016 by the European Space Agency.

show abstract

Injecting FPGA Configuration Faults in Parallel

Fleming

Thomas

2018

View full text Add to dashboard Cite

When using SRAM-based FPGA devices in safetycritical applications testing against bitflips in the device configuration memory is essential. Often such tests are achieved by corrupting configuration memory bits of a running device, but this has many scalability, reliability, and flexibility challenges. In this paper, we present a framework and a concrete implementation of a parallel fault injection cluster that addresses these challenges. Scalability is addressed by using multiple identical FPGA devices, each testing a different region in parallel. Reliability is addressed by using reconfigurable system-on-chip devices, that are isolated from each other. Flexibility is addressed by using a pendingcommit structure, that continually checkpoints the overall experiment and allows elastic scaling. We test and showcase our approach by exhaustively flipping every bit in the configuration memory of the CHStone benchmark suite and a VivadoHLS generated k-means clustering image processing application. Our results show that: linear scaling is possible as the number of devices increases; the majority of error inducing bitflips in the k-means application do not significantly impact the output; and that the Xilinx Essential bits tool may miss some bits that can induce errors.

show abstract

Hardware Synthesis of Weakly Consistent C Concurrency

Ramanathan

Fleming

Wickerson

et al. 2017

View full text Add to dashboard Cite

Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations ('atomics'), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This paper explores how these algorithms can be compiled from C to reconfigurable hardware via high-level synthesis (HLS).We focus on the scheduling problem, in which software instructions are assigned to hardware clock cycles. We first show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction reorderings that, though sound in a single-threaded context, demonstrably cause erroneous results when synthesising multi-threaded programs. We then show that correct behaviour can be restored by imposing additional intra-thread constraints among the memory operations. We implement our approach in the open-source LegUp HLS framework, and provide both sequentially consistent (SC) and weakly consistent ('weak') atomics. Weak atomics necessitate fewer constraints than SC atomics, but suffice for many concurrent algorithms. We confirm, via automatic model-checking, that we correctly implement the semantics defined by the 2011 revision of the C standard. A case study on a circular buffer suggests that circuits synthesised from programs that use atomics can be 2.5x faster than those that use locks, and that weak atomics can yield a further 1.5x speedup. Keywordsatomic operations, C/C++, FPGAs, high-level synthesis, lock-free algorithms, memory consistency models, scheduling.

show abstract

Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code

Cheng

Fleming

Chen

et al. 2022

IEEE Trans. Comput.

View full text Add to dashboard Cite

High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however in recent years there has been a drive to synthesise multi-threaded code. In this context, a major challenge facing HLS tools is how to automatically partition memory among parallel threads to fully exploit the bandwidth available on an FPGA device and minimise memory contention. Existing partitioning approaches require inefficient arbitration circuitry to serialise accesses to each bank because they make conservative assumptions about which threads might access which memory banks. In this article, we design a static analysis that can prove certain memory banks are only accessed by certain threads, and use this analysis to simplify or even remove the arbiters while preserving correctness. We show how this analysis can be implemented using the Microsoft Boogie verifier on top of satisfiability modulo theories (SMT) solver, and propose a tool named EASY using automatic formal verification. Our work supports arbitrary input code with any irregular memory access patterns and indirect array addressing forms. We implement our approach in LLVM and integrate it into the LegUp HLS tool. For a set of typical application benchmarks our results have shown that EASY can achieve 0.13× (avg. 0.43×) of area and 1.64× (avg. 1.28×) of performance compared to the baseline, with little additional compilation time relative to the long time in hardware synthesis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shane T. Fleming

Is high level synthesis ready for business? A computational finance case study

Heterogeneous Heartbeats: A framework for dynamic management of Autonomous SoCs

Injecting FPGA Configuration Faults in Parallel

Hardware Synthesis of Weakly Consistent C Concurrency

Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code

Contact Info

Product

Resources

About