FLASH: Fast, Parallel, and Accurate Simulator for HLS

Choi, Y.; Chi, Yuze; Wang, Jie; Cong, Jason

doi:10.1109/tcad.2020.2970597

Cited by 19 publications

(11 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Performance counters are inserted to the accelerator to collect the relevant metrics. We implement SPLAG using an open-source extension to HLS C++, TAPA [10], to leverage the convenient peeking interfaces, fast software simulation [6,12], asynchronous memory interfaces, simplified host-kernel interfaces, and coarse-grained floorplanning [26,27]. Our implementation targets the Alveo U280 board with 32 high-bandwidth memory (HBM) channels.…”

Section: Discussionmentioning

confidence: 99%

Accelerating SSSP for Power-Law Graphs

Chi

Guo

Cong

2022

Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

The single-source shortest path (SSSP) problem is one of the most important and well-studied graph problems widely used in many application domains, such as road navigation, neural image reconstruction, and social network analysis. Although we have known various SSSP algorithms for decades, implementing one for largescale power-law graphs efficiently is still highly challenging today, because ① a work-efficient SSSP algorithm requires priority-order traversal of graph data, ② the priority queue needs to be scalable both in throughput and capacity, and ③ priority-order traversal requires extensive random memory accesses on graph data.In this paper, we present SPLAG to accelerate SSSP for powerlaw graphs on FPGAs. SPLAG uses a coarse-grained priority queue (CGPQ) to enable high-throughput priority-order graph traversal with a large frontier. To mitigate the high-volume random accesses, SPLAG employs a customized vertex cache (CVC) to reduce off-chip memory access and improve the throughput to read and update vertex data. Experimental results on various synthetic and realworld datasets show up to a 4.9× speedup over state-of-the-art SSSP accelerators, a 2.6× speedup over 32-thread CPU running at 4.4 GHz, and a 0.9× speedup over an A100 GPU that has 4.1× power budget and 3.4× HBM bandwidth. Such a high performance would place SPLAG in the 14th position of the Graph 500 benchmark for data intensive applications (the highest using a single FPGA) with only a 45 W power budget. SPLAG is written in high-level synthesis C++ and is fully parameterized, which means it can be easily ported to various different FPGAs with different configurations. SPLAG is open-source at https://github.com/UCLA-VAST/splag. CCS CONCEPTS• Theory of computation → Shortest paths; • Computer systems organization → Reconfigurable computing; High-level language architectures.

show abstract

Section: Discussionmentioning

confidence: 99%

Accelerating SSSP for Power-Law Graphs

Chi

Guo

Cong

2022

Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

show abstract

“…The 200 MHz target frequency of other FPGA designs remain unchanged. [48,89] and by offloading simulation to an FPGA [64,65,97] or a GPU [91]. Our debugging tools are designed for both on-FPGA and simulation-based debugging.…”

Section: Efficiency Of Debugging Toolsmentioning

confidence: 99%

Debugging in the brave new world of reconfigurable hardware

Zuo

Loughlin

et al. 2022

Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

View full text Add to dashboard Cite

Software and hardware development cycles have traditionally been quite distinct. Software allows post-deployment patches, which leads to a rapid development cycle. In contrast, hardware bugs that are found after fabrication are extremely costly to fix (and sometimes even unfixable), so the traditional hardware development cycle involves massive investment in extensive simulation and formal verification. Reconfigurable hardware, such as a Field Programmable Gate Array (FPGA), promises to propel hardware development towards an agile software-like development approach, since it enables a hardware developer to patch bugs that are detected during on-chip testing or in production. Unfortunately, FPGA programmers lack bug localization tools amenable to this rapid development cycle, since past tools mainly find bugs via simulation and verification. To develop hardware bug localization tools for a rapid development cycle, a thorough understanding of the symptoms, root causes, and fixes of hardware bugs is needed.In this paper, we first study bugs in existing FPGA designs and produce a testbed of reliably-reproducible bugs. We classify the bugs according to their intrinsic properties, symptoms, and root causes. We demonstrate that many hardware bugs are comparable to software bug counterparts, and would benefit from similar techniques for bug diagnosis and repair. Based upon our findings, we build a novel collection of hybrid static/dynamic program analysis and monitoring tools for debugging FPGA designs, showing that our tools enable a software-like development cycle by effectively reducing developers' manual efforts for bug localization. CCS CONCEPTS• Hardware → Reconfigurable logic and FPGAs; • Software and its engineering → Software testing and debugging.

show abstract

“…However, they may perform poorly due to the ine ciency of inter-thread communication and context switch handled by the operating system. e FLASH simulator [8,12] proposed an alternative to the above, which relies on the HLS scheduling information to mimic the RTL FSM. While this simulation approach itself is faster than multi-thread simulators, generating simulation executable becomes slower due to the need of the HLS scheduler output for cycle-accuracy, which is not needed for correctness veri cation.…”

Section: So Ware Simulationmentioning

confidence: 99%

Extending High-Level Synthesis for Task-Parallel Programs

Chi

Guo

Lau

et al. 2021

2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Self Cite

View full text Add to dashboard Cite

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for eld-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of result (QoR) and short development cycle compared with the traditional register-transfer level (RTL) design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a ne-grained level. While current HLS tools support taskparallel programs, the productivity is greatly limited in the code development, correctness veri cation, and QoR tuning cycles, due to the poor programmability, restricted so ware simulation, and slow code generation, respectively. Such limited productivity o en defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators.In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, universal so ware simulation, and fast code generation to overcome these limitations. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. e correctness veri cation and the iterative QoR tuning cycles are both greatly accelerated by 3.2× and 6.8×, respectively.

show abstract

FLASH: Fast, Parallel, and Accurate Simulator for HLS

Cited by 19 publications

References 26 publications

Accelerating SSSP for Power-Law Graphs

Accelerating SSSP for Power-Law Graphs

Debugging in the brave new world of reconfigurable hardware

Extending High-Level Synthesis for Task-Parallel Programs

Contact Info

Product

Resources

About