Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2016
DOI: 10.1145/2847263.2847343
|View full text |Cite
|
Sign up to set email alerts
|

A Case for Work-stealing on FPGAs with OpenCL Atomics

Abstract: We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize workitems not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera's OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on an Altera P385 D5 board, both with work-stealing and with a statically-partitioned load. When block RAM utilization is maximized in b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 31 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…As a result, underutilized PEs stealing the workload from the overloaded PEs and writing the results back to their buffers after the calculation will not payoff [14]. In addition, heavy operations (e.g., atomic operation) will stall the processing pipeline, resulting in new system bottlenecks [11]. Challenge 2: How to minimize manual efforts for skew handling?…”
Section: Challenges and Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, underutilized PEs stealing the workload from the overloaded PEs and writing the results back to their buffers after the calculation will not payoff [14]. In addition, heavy operations (e.g., atomic operation) will stall the processing pipeline, resulting in new system bottlenecks [11]. Challenge 2: How to minimize manual efforts for skew handling?…”
Section: Challenges and Solutionsmentioning
confidence: 99%
“…Since PEs process distinctive ranges of data, skew datasets may cause some PEs overloaded or underutilized, which essentially diminishes performance. The challenge of skew handling for dataintensive applications is that the lightweight computation (e.g., the calculation with integers finished within one cycle) cannot tolerate any heavy workload rebalancing operations such as atomic-based work-stealing [11]. Besides, skew handling needs to adapt to very different data distributions in a robust manner and requires sizable hardware expertise in general; therefore, the other challenge is to minimize the manual development efforts for developers.…”
Section: Introductionmentioning
confidence: 99%
“…Neither of these works support the explicit multi-threading constructs defined by the Pthreads standard, so a direct comparison with the present work is difficult. Altera's SDK for OpenCL [3] supports lock-free programming via atomics [26], though the commercial nature of the tool makes it difficult to ascertain exactly how these operations are implemented. LEAP facilitates parallel memory access through its provision of memory hierarchies that potentially can be shared among Pthreads in a lock-free manner [32].…”
Section: High-level Synthesismentioning
confidence: 99%
“…However, recent devices ś including Intel's Xeon+FPGA system [Intel 2019;Oliver et al 2011], the IBM CAPI [Stuecheli et al 2015] and the Xilinx Alveo [Xilinx 2018] ś offer a fine-grained shared-memory interface between the CPU and FPGA. This enables synchronisation idioms where data is exchanged in arbitrary (potentially small) amounts, such as work stealing, which has been shown to enable significant speedups in difficult-to-accelerate applications [e.g., Farooqui et al 2016;Ramanathan et al 2016;Tzeng et al 2010].…”
Section: Introductionmentioning
confidence: 99%