Design and implementation of a parallel priority queue on many-core architectures

He, Xi; Agarwal, Dinesh; Prasad, Sushil K.

doi:10.1109/hipc.2012.6507490

Cited by 14 publications

(18 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GPU parallel priority queues [24] improve over the serial heap update by allowing multiple concurrent updates, but they require a potential number of small sorts for each insert and data-dependent memory movement. Moreover, it uses multiple synchronization barriers through kernel launches in different streams, plus the additional latency of successive kernel launches and coordination with the CPU host.…”

Section: K-selection On Cpu Versus Gpumentioning

confidence: 99%

Billion-Scale Similarity Search with GPUs

Johnson

Douze²,

Jeǵou³

2021

IEEE Trans. Big Data

1,981

1,110

View full text Add to dashboard Cite

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy.We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5× faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach 1 for the sake of comparison and reproducibility.

show abstract

Section: K-selection On Cpu Versus Gpumentioning

confidence: 99%

Billion-Scale Similarity Search with GPUs

Johnson

Douze²,

Jeǵou³

2021

IEEE Trans. Big Data

1,981

1,110

View full text Add to dashboard Cite

show abstract

“…In the heap version, however, the sorting of elements by their values asserts that propagations carried by a thread are more likely to be final and, as a consequence, increases parallel efficiency. In an earlier effort, we implemented the IWPP using a state-of-the-art priority queue proposed by He et al [15], but it was not able to improve the regular queue based GPU implementation because of the data management costs.…”

Section: Resultsmentioning

confidence: 99%

Efficient Irregular Wavefront Propagation Algorithms on Intel(R) Xeon Phi(TM)

Gomes

Teodoro

Melo

et al. 2015

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

View full text Add to dashboard Cite

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations.

show abstract

“…An extensive body of work has embarked on the redesign of data structures for construction and general computation on the GPU [88]. Within the context of searching, these acceleration structures include sorted arrays [3], [4], [8], [51], [66], [67], [98] and linked lists [116], hash tables (see section III), spatial-partitioning trees (e.g., k-d trees [57], [115], [120], octrees [57], [119], bounding volume hierarchies (BVH) [57], [64], R-trees [71], and binary indexing trees [59], [99]), spatial-partitioning grids (e.g., uniform [36], [53], [62] and two-level [52]), skiplists [81], and queues (e.g., binary heap priority [43] and FIFO [17], [101]). Due to significant architectural differences between the CPU and GPU, search structures cannot simply be "ported" from the CPU to the GPU and maintain optimal performance.…”

Section: Gpu Searchingmentioning

confidence: 99%

Data-Parallel Hashing Techniques for GPU Architectures

Lessley

Childs

2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Hash tables are one of the most fundamental data structures for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the stateof-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. Key factors affecting the performance of different hashing schemes are discovered and used to suggest best practices and pinpoint areas for further research.

show abstract

Design and implementation of a parallel priority queue on many-core architectures

Cited by 14 publications

References 20 publications

Billion-Scale Similarity Search with GPUs

Billion-Scale Similarity Search with GPUs

Efficient Irregular Wavefront Propagation Algorithms on Intel(R) Xeon Phi(TM)

Data-Parallel Hashing Techniques for GPU Architectures

Contact Info

Product

Resources

About