2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017
DOI: 10.1109/mascots.2017.15
|View full text |Cite
|
Sign up to set email alerts
|

Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs

Abstract: Graphics processing units (GPUs) are increasingly applied to accelerate tasks such as graph problems and discreteevent simulation that are characterized by irregularity, i.e., a strong dependence of the control flow and memory accesses on the input. The core data structure in many of these irregular tasks are priority queues that guide the progress of the computations and which can easily become the bottleneck of an application. To our knowledge, currently no systematic comparison of priority queue implementat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 43 publications
0
3
0
Order By: Relevance
“…Baudis et al [13] evaluate the performance of PQs on a GPU implemented as a single parallel heap or as a set of ring buffers, implicit binary heaps, and splay trees [146] in the context of DES and path finding on grids. Their results indicate that for up to about 500 elements per PQ, ring buffers achieve the highest performance.…”
Section: Representation Of Irregular Data Structures By Arrays and Gridsmentioning
confidence: 99%
“…Baudis et al [13] evaluate the performance of PQs on a GPU implemented as a single parallel heap or as a set of ring buffers, implicit binary heaps, and splay trees [146] in the context of DES and path finding on grids. Their results indicate that for up to about 500 elements per PQ, ring buffers achieve the highest performance.…”
Section: Representation Of Irregular Data Structures By Arrays and Gridsmentioning
confidence: 99%
“…While not inherently parallel, in the context of the EM or cache-oblivious models, the cache-oblivious bucket heap [4] and buffer heap [12] structures achieve sub-constant time operations when the block size, B, is sufficiently large. Since there are no parallel, cache-efficient priority queue structures, few works have considered using priority queues on GPUs [13,14]. While, in 2012, He et al [14] presented a priority queue that could achieve a 30x speedup over sequential execution, Baudis et al [13] more recently demonstrated that, for small queues of up to 500 items, simple circular buffers out-perform tree-based queues for a range of applications.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Since there are no parallel, cache-efficient priority queue structures, few works have considered using priority queues on GPUs [13,14]. While, in 2012, He et al [14] presented a priority queue that could achieve a 30x speedup over sequential execution, Baudis et al [13] more recently demonstrated that, for small queues of up to 500 items, simple circular buffers out-perform tree-based queues for a range of applications.…”
Section: Background and Related Workmentioning
confidence: 99%
“…The implementation based on ring buffers and the synchronisation based on atomic operations closely resembles GPUbased discrete-event simulations, which have been shown to achieve high speedup over a CPU-based execution [35], [36]. Our approach to conflict resolution postpones the conflict resolution to after the Act stage and iterates until all conflicts have been resolved based on the relative position of agents.…”
Section: Full Offloadingmentioning
confidence: 99%