Data-Parallel Hashing Techniques for GPU Architectures

Lessley, Brenton; Childs, Hank

doi:10.1109/tpds.2019.2929768

Cited by 28 publications

(8 citation statements)

References 90 publications

(215 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, since this method fundamentally uses a queue data structure, if the size of the queue that each thread processes cannot be adjusted, load imbalance may occur when using it in a GPU. The method proposed in Algorithm 1 utilizes storing the minimum value in a hash table to enable efficient searching in a GPU environment without causing a load [46]. Init grid G 3:…”

Section: Fragment Detection Algorithmmentioning

confidence: 99%

Material Point Method-Based Simulation Techniques for Medical Applications

Sung,

Kim,

Shin

2024

Electronics

View full text Add to dashboard Cite

We propose a method for recognizing fragment objects to model the detailed tearing of elastic objects like human organs. Traditional methods require high-performance GPUs for real-time calculations to accurately simulate the detailed fragmentation of rapidly deforming objects or create random fragments to improve visual effects with minimal computation. The proposed method utilizes a deep neural network (DNN) to produce physically accurate results without requiring high-performance GPUs. Physically parameterized material point method (MPM) simulation data were used to learn small-scale detailed fragments. The tearing process is segmented and learned based on various training data from different spaces and external forces. The inference algorithm classifies the fragments from the training data and modifies the deformation gradient using a modifier. An experiment was conducted to compare the proposed method and the traditional MPM in the same environment. As a result, it was confirmed that visual fidelity for the tearing of elastic objects has been improved. This supports the simulation of various incision types in a virtual surgery.

show abstract

Section: Fragment Detection Algorithmmentioning

confidence: 99%

Material Point Method-Based Simulation Techniques for Medical Applications

Sung,

Kim,

Shin

2024

Electronics

View full text Add to dashboard Cite

show abstract

“…To alleviate this bottleneck, we leverage the fast memory interface of modern CUDA accelerators. High-throughput GPU hash tables have been studied extensively [21]. However, most existing implementation show limitations which make them unsuitable for our use case.…”

Section: Related Workmentioning

confidence: 99%

MetaCache-GPU: Ultra-Fast Metagenomic Classification

Kobus

Müller

Jünger

et al. 2021

50th International Conference on Parallel Processing

View full text Add to dashboard Cite

The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their 𝑘-mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growing amount of available reference genomes establishes the need for index construction and querying at interactive speeds. In this paper, we introduce MetaCache-GPU -an ultra-fast metagenomic short read classifier specifically tailored to fit the characteristics of CUDA-enabled accelerators. Our approach employs a novel hash table variant featuring efficient minhash fingerprinting of reads for locality-sensitive hashing and their rapid insertion using warp-aggregated operations. Our performance evaluation shows that MetaCache-GPU is able to build large reference databases in a matter of seconds, enabling instantaneous operability, while popular CPU-based tools such as Kraken2 require over an hour for index construction on the same data. In the context of an ever-growing number of reference genomes, MetaCache-GPU is the first metagenomic classifier that makes analysis pipelines with on-demand composition of largescale reference genome sets practical. The source code is publicly available at https://github.com/muellan/metacache.

show abstract

“…Several data-parallel GPU hash table implementations have been proposed which aim to leverage the fast memory bandwidth provided by modern GPUs. Lessley et al [13] provide a comprehensive survey of these approaches and highlight the respective concepts and techniques used.…”

Section: Related Workmentioning

confidence: 99%

WarpCore: A Library for fast Hash Tables on GPUs

Jünger

Kobus

Müller

et al. 2020

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

View full text Add to dashboard Cite

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines entirely on the GPU. Our implementation achieves up to 1.6 billion inserts and up to 4.3 billion retrievals per second on a single GV100 GPU thereby outperforming the stateof-the-art solutions cuDPP, SlabHash, and NVIDIA RAPIDS cuDF. This performance advantage becomes even more pronounced for high load factors of over 90%. To overcome the memory limitation of a single GPU, we scale our approach over a dense NVLink topology which gives us close-to-optimal weak scaling on DGX servers. We further show how WarpCore can be used for accelerating a real world bioinformatics application (metagenomic classification) with speedups of over two orders-of-magnitude against state-of-theart CPU-based solutions. We plan to make our library publicly available upon acceptance of the paper.

show abstract

Data-Parallel Hashing Techniques for GPU Architectures

Cited by 28 publications

References 90 publications

Material Point Method-Based Simulation Techniques for Medical Applications

Material Point Method-Based Simulation Techniques for Medical Applications

MetaCache-GPU: Ultra-Fast Metagenomic Classification

WarpCore: A Library for fast Hash Tables on GPUs

Contact Info

Product

Resources

About