Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs

Verma, Anshuman; Zhou, Huiyang; Booth, Skip; King, Robbie; Coole, James; Keep, Andy; Marshall, John; Feng, Wu-chun

doi:10.1145/3061639.3062230

Cited by 12 publications

(9 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, this virtual architecture cannot be associated with or even applied to commercially available OpenCL-for-FPGA design tools. A framework for debugging and monitoring OpenCL-based FPGA designs was proposed in [43]. This framework is limited to capturing events and their sequences based on timestamps.…”

Section: Related Work On In-system Fpga Instrumentationmentioning

confidence: 99%

In-FPGA Instrumentation Framework for OpenCL-Based Designs

2020

View full text Add to dashboard Cite

The productivity achieved when developing applications on high-performance reconfigurable heterogeneous computing (HPRHC) systems is increased by using the Open Computing Language (OpenCL). However, the hardware produced by OpenCL compilers in field-programmable gate arrays (FPGAs) can result in severe performance bottlenecks that are challenging to solve. The problem is compounded by the fact that the generated netlist details are disorganized, making them mostly unreadable and only partially visible to designers. This paper proposes an in-FPGA instrumentation method and a new framework for extracting the FPGA-cycle-accurate timing performances of OpenCL-based designs. The results clearly show that the chosen execution model for OpenCL-based designs strongly affects the timing performance when it is not properly implemented. Our framework is implemented on an HPRHC platform that contains a CPU and two Arria10 FPGAs, and it is evaluated with a wide variety of benchmarks with different complexities. After testing on the reported benchmarks, the average logic overhead for one inserted instrument is 0.2 % of the total amount of adaptive look-up tables (ALUTs) and 0.1 % of the total registers in an FPGA. This resource utilization is between 1.5 and six times lower than those reported in the best previously published works. The scalability of the framework is also evaluated by inserting up to 50 instruments. The experimental results show that the average logic utilization per instrument is 0.19 % of the ALUTs and 0.17 % of the registers in the FPGA when 50 instruments are inserted.

show abstract

Section: Related Work On In-system Fpga Instrumentationmentioning

confidence: 99%

In-FPGA Instrumentation Framework for OpenCL-Based Designs

2020

View full text Add to dashboard Cite

show abstract

“…As for methodologies for debugging hardware generated from multi-threaded programs, one of the few contributions (besides the work of Goeders et al mentioned above [19]) is a work of Verma et al [47] targeting OpenCL for FPGAs. The authors describe open-source debug components, modeled both in the OpenCL language and in Verilog Hardware Description Language (HDL), that can be used for manual inspection of OpenCL kernels running on FPGA.…”

Section: Related Workmentioning

confidence: 99%

“…Debug support for circuits generated with HLS has received attention, but for designs synthesized from multi-threaded parallel programs the contributions are scarce. Current approaches focus on the low-level details of the infrastructure necessary for on-chip debugging [19,47]. Users need to explicitly instruct the tools about where to place tracepoints and manually inspect the traces to spot malfunctions.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications

Fezzardi

Ferrandi

2020

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

Field Programmable Gate Arrays (FPGAs) are becoming an appealing technology in datacenters and High Performance Computing. High-Level Synthesis (HLS) of multi-threaded parallel programs is increasingly used to extract parallelism. Despite great leaps forward in HLS and related debugging methodologies, there is a lack of contributions in automated bug identification for HLS of multi-threaded programs. This work defines a methodology to automatically detect and isolate bugs in parallel circuits generated with HLS. The technique relies on hardware/software Discrepancy Analysis and exploits a pattern-matching algorithm based on Finite State Automata to compare multiple hardware and software threads. Overhead, advantages, and limitations are evaluated on designs generated with an open-source HLS compiler supporting OpenMP.

show abstract

“…Work in [5,14,20,25] describe frameworks that allow users to specify debugging points in high-level language and synthesize hardware probes into the FPGA for analysis. They can be categorized into work that has more focus on verifying functional correctness [14,20] and work that has more focus on extracting performancerelated parameters [5,25]. Work in [14] describes how to record and replay the execution of optimized HLS-generated circuits.…”

Section: Related Workmentioning

confidence: 99%

Rapid Cycle-Accurate Simulator for High-Level Synthesis

Chi

Choi

Cong

et al. 2019

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

A large semantic gap between the high-level synthesis (HLS) design and the low-level (on-board or RTL) simulation environment often creates a barrier for those who are not FPGA experts. Moreover, such low-level simulation takes a long time to complete. Softwarebased HLS simulators can help bridge this gap and accelerate the simulation process; however, we found that the current FPGA HLS commercial software simulators sometimes produce incorrect results. In order to solve this correctness issue while maintaining the high speed of a software-based simulator, this paper proposes a new HLS simulation flow named FLASH. The main idea behind the proposed flow is to extract the scheduling information from the HLS tool and automatically construct an equivalent cycle-accurate simulation model while preserving C semantics. Experimental results show that FLASH runs three orders of magnitude faster than the RTL simulation.

show abstract

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs

Cited by 12 publications

References 4 publications

In-FPGA Instrumentation Framework for OpenCL-Based Designs

In-FPGA Instrumentation Framework for OpenCL-Based Designs

Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications

Rapid Cycle-Accurate Simulator for High-Level Synthesis

Contact Info

Product

Resources

About