Engineering a high-performance GPU B-Tree

Awad, Muhammad A.; Ashkiani, Saman; Johnson, Rob; Farach-Colton, Martı́n; Owens, John D.

doi:10.1145/3293883.3295706

“…As described earlier in this Section, Algorithm 6 performs Θ(N ) set operations (unions and differences) during the prefix computation, and therefore the data structure used for representing sets must be chosen wisely (we will return to this issue in Section 5). Data structures based on hash tables or trees are problematic on the GPU, although not impossible [9,10]. A simpler implementation of sets using bit vectors appears to be better suited: a bit vector is a sequence of N bits, where item i is in the set if and only if the i-th bit is one.…”

Section: Some Remarks On Distributed-memory and Gpu Implementationsmentioning

confidence: 99%

Parallel Data Distribution Management on Shared-memory Multiprocessors

Marzolla

¹

,

D’Angelo

²

2020

ACM Trans. Model. Comput. Simul.

View full text Add to dashboard Cite

The problem of identifying intersections between two sets of d -dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -a standard framework for interoperability among simulatorsincludes a Data Distribution Management (DDM) service whose responsibility is to report all intersections between a set of subscription and update regions. The algorithms at the core of the DDM service are CPU-intensive, and could greatly benefit from the large computing power of modern multicore processors. In this paper we propose two parallel solutions to the DDM problem that can operate effectively on shared-memory multiprocessors. The first solution is based on a data structure (the Interval Tree) that allows concurrent computation of intersections between subscription and update regions. The second solution is based on a novel parallel extension of the Sort Based Matching algorithm, whose sequential version is considered among the most efficient solutions to the DDM problem. Extensive experimental evaluation of the proposed algorithms confirm their effectiveness on taking advantage of multiple execution units in a shared-memory architecture.

show abstract

“…Also, the majority of the indexes are designed for a specific use, whether they are for low-or high-dimensional data, for the CPU, for the GPU, or both architectures. We identify different indexing methods, including those designed for the CPU [2,3,[24][25][26][27][28][29], the GPU [10,12,30], or both architectures [15][16][17]. As our algorithm focuses on the low-dimensionality distance similarity search, we focus on presenting indexing methods that are designed for lower dimensions.…”

Section: Data Indexingmentioning

confidence: 99%

“…They replaced these accesses by sequential accesses, particularly by allowing the search of the tree to jump from a node to its next sibling. [10] improve the efficiency of the B-Tree by using nodes the size of the GPU's cache access size and by avoiding recursive calls during the tree traversal as well. Furthermore, they assign multiple queries to a warp, with all the threads of the same warp that cooperate to compute one query at a time, thus reducing intra-warp thread divergence.…”

Section: Data Indexingmentioning

confidence: 99%

“…Graphics processing units (GPUs) have been increasingly used for general computational problems and particularly for improving similarity join performance [4,5], and with specific data indexing methods that are suited to the GPU's particular single instruction multiple threads (SIMT) architecture [10][11][12][13][14]. The proliferation of GPUs is particularly explained by their increased computational throughput and higher memory bandwidth compared to CPUs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies

Gallet

¹

,

Gowanlock

²

2020

View full text Add to dashboard Cite

Given two datasets (or tables) A and B and a search distance $$\epsilon$$ ϵ , the distance similarity join, denoted as $$A \ltimes _\epsilon B$$ A ⋉ ϵ B , finds the pairs of points ($$p_a$$ p a , $$p_b$$ p b ), where $$p_a \in A$$ p a ∈ A and $$p_b \in B$$ p b ∈ B , and such that the distance between $$p_a$$ p a and $$p_b$$ p b is $$\le \epsilon$$ ≤ ϵ . If $$A = B$$ A = B , then the similarity join is equivalent to a similarity self-join, denoted as $$A \bowtie _\epsilon A$$ A ⋈ ϵ A . We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin, we design in this paper a dynamic and two static work partitioning strategies. We also propose a performance model for each static partitioning strategy to perform the distribution of the work between the processors. We evaluate the performance of all three partitioning methods by considering the execution time and the load imbalance between the CPU and GPU as performance metrics. HEGJoin achieves a speedup of up to $$5.46\times$$ 5.46 × ($$3.97\times$$ 3.97 × ) over the GPU-only (CPU-only) algorithms on our first test platform and up to $$1.97\times$$ 1.97 × ($$12.07\times$$ 12.07 × ) on our second test platform over the GPU-only (CPU-only) algorithms.

show abstract

“…The conversion between persistent and temporary is expensive and unnecessary with NVM. One such use case is in-memory databases, especially the GPU accelerated ones including Mega-KV [9], GPU B-Tree [10], Kinetica [11], etc. Second, long-running GPU applications, including training deep neural networks, computing proof of work in blockchain applications, scientific computation using iterative approaches, etc., would benefit from fault tolerance with RDS.…”

Section: Introductionmentioning

confidence: 99%

Exploring Memory Persistency Models for GPUs

Lin

¹

,

Alshboul

²

,

Solihin

³

et al. 2019

2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)

View full text Add to dashboard Cite

Given its high integration density, high speed, byte addressability, and low standby power, nonvolatile or persistent memory is expected to supplement/replace DRAM as main memory. Through persistency programming models (which define durability ordering of stores) and durable transaction constructs, the programmer can provide recoverable data structure (RDS) which allows programs to recover to a consistent state after a failure. While persistency models have been well studied for CPUs, they have been neglected for graphics processing units (GPUs). Considering the importance of GPUs as a dominant accelerator for high performance computing, we investigate persistency models for GPUs. GPU applications exhibit substantial differences with CPUs applications, hence in this paper we adapt, re-architect, and optimize CPU persistency models for GPUs. We design a pragma-based compiler scheme to express persistency models for GPUs. We identify that the thread hierarchy in GPUs offers intuitive scopes to form epochs and durable transactions. We find that undo logging produces significant performance overheads. We propose to use idempotency analysis to reduce both logging frequency and the size of logs. Through both real-system and simulation evaluations, we show low overheads of our proposed architecture support.

show abstract

Engineering a high-performance GPU B-Tree

Cited by 34 publications

References 33 publications

Parallel Data Distribution Management on Shared-memory Multiprocessors

Parallel Data Distribution Management on Shared-memory Multiprocessors

Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies

Exploring Memory Persistency Models for GPUs

Contact Info

Product

Resources

About