Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming 2019
DOI: 10.1145/3293883.3295706
|View full text |Cite
|
Sign up to set email alerts
|

Engineering a high-performance GPU B-Tree

Abstract: We engineer a GPU implementation of a B-Tree that supports concurrent queries (point, range, and successor) and updates (insertions and deletions). Our B-tree outperforms the state of the art, a GPU log-structured merge tree (LSM) and a GPU sorted array. In particular, point and range queries are significantly faster than in a GPU LSM (the GPU LSM does not implement successor queries). Furthermore, B-Tree insertions are also faster than LSM and sorted array insertions unless insertions come in batches of more … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…As described earlier in this Section, Algorithm 6 performs Θ(N ) set operations (unions and differences) during the prefix computation, and therefore the data structure used for representing sets must be chosen wisely (we will return to this issue in Section 5). Data structures based on hash tables or trees are problematic on the GPU, although not impossible [9,10]. A simpler implementation of sets using bit vectors appears to be better suited: a bit vector is a sequence of N bits, where item i is in the set if and only if the i-th bit is one.…”
Section: Some Remarks On Distributed-memory and Gpu Implementationsmentioning
confidence: 99%
“…As described earlier in this Section, Algorithm 6 performs Θ(N ) set operations (unions and differences) during the prefix computation, and therefore the data structure used for representing sets must be chosen wisely (we will return to this issue in Section 5). Data structures based on hash tables or trees are problematic on the GPU, although not impossible [9,10]. A simpler implementation of sets using bit vectors appears to be better suited: a bit vector is a sequence of N bits, where item i is in the set if and only if the i-th bit is one.…”
Section: Some Remarks On Distributed-memory and Gpu Implementationsmentioning
confidence: 99%
“…Also, the majority of the indexes are designed for a specific use, whether they are for low-or high-dimensional data, for the CPU, for the GPU, or both architectures. We identify different indexing methods, including those designed for the CPU [2,3,[24][25][26][27][28][29], the GPU [10,12,30], or both architectures [15][16][17]. As our algorithm focuses on the low-dimensionality distance similarity search, we focus on presenting indexing methods that are designed for lower dimensions.…”
Section: Data Indexingmentioning
confidence: 99%
“…They replaced these accesses by sequential accesses, particularly by allowing the search of the tree to jump from a node to its next sibling. [10] improve the efficiency of the B-Tree by using nodes the size of the GPU's cache access size and by avoiding recursive calls during the tree traversal as well. Furthermore, they assign multiple queries to a warp, with all the threads of the same warp that cooperate to compute one query at a time, thus reducing intra-warp thread divergence.…”
Section: Data Indexingmentioning
confidence: 99%
See 1 more Smart Citation
“…The conversion between persistent and temporary is expensive and unnecessary with NVM. One such use case is in-memory databases, especially the GPU accelerated ones including Mega-KV [9], GPU B-Tree [10], Kinetica [11], etc. Second, long-running GPU applications, including training deep neural networks, computing proof of work in blockchain applications, scientific computation using iterative approaches, etc., would benefit from fault tolerance with RDS.…”
Section: Introductionmentioning
confidence: 99%