In-memory database acceleration on FPGAs: a survey

Fang, Jian; Mulder, Y.T.B.; Hidders, Jan; Lee, Jinho; Hofstee, H. Peter

doi:10.1007/s00778-019-00581-w

Cited by 69 publications

(27 citation statements)

References 104 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Each dataset consists of several benchmarks with cardinalities ranging from 2 10 to 2 22 unique keys. The relation size in all of the experiments was 256 million tuples (in line with previous research [59]).…”

Section: Dataset Descriptionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient local locking for massively multithreaded in-memory hash-based operators

et al. 2021

View full text Add to dashboard Cite

The join and group-by aggregation are two memory intensive operators that are affecting the performance of relational databases. Hashing is a common approach used to implement both operators. Recent paradigm shifts in multi-core processor architectures have reinvigorated research into how the join and group-by aggregation operators can leverage these advances. However, the poor spatial locality of the hashing approach has hindered performance on multi-core processor architectures which rely on using large cache hierarchies for latency mitigation. Multithreaded architectures can better cope with poor spatial locality by masking memory latency with many outstanding requests. Nevertheless, the number of parallel threads, even in the most advanced multithreaded processors, such as UltraSPARC, is not enough to fully cover the main memory access latency. In this paper, we explore the hardware re-configurability of FPGAs to enable deeper execution pipelines that maintain hundreds (instead of tens) of outstanding memory requests across four FPGAs-drastically increasing concurrency and throughput. We present two end-to-end in-memory accelerators for the join and group-by aggregation operators using FPGAs. Both accelerators use massive multithreading to mask long memory delays of traversing linked-list data structures, while concurrently managing hundreds of thread states across four FPGAs locally. We explore how content addressable memories can be intermixed within our multithreaded designs to act as a synchronizing cache, which enforces locks and merges jobs together before they are written to memory. Throughput results for our hash-join operator accelerator show a speedup between 2$$\times $$ × and 3.4$$\times $$ × over the best multi-core approaches with comparable memory bandwidths on uniform and skewed datasets. The accelerator for the hash-based group-by aggregation operator demonstrates that leveraging CAMs achieves average speedup of 3.3$$\times $$ × with a best case of 9.4$$\times $$ × in terms of throughput over CPU implementations across five types of data distributions.

show abstract

Section: Dataset Descriptionmentioning

confidence: 99%

“…(2) Irregular data-flow where indirection in the memory access patterns breaks the data locality and hence causes cache misses. Some database operators, such as selection, exhibit control flow irregularity, while others, like hash-join and (hash-based) group-by aggregation, can demonstrate both [22].…”

Section: Introductionmentioning

confidence: 99%

Efficient local locking for massively multithreaded in-memory hash-based operators

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Veritabanı sistemleri genellikle yüksek işlem gücü gerektiren komutlardan öte, çok sayıda komutun bir sıraya alınarak en kısa zamanda işlenerek yanıt verilmesi odağında çalışmak- (Fang et al, 2020). Her ne kadar veritabanı ilkelerine karşı durulan noktalar olsa da OLAP sayesinde işletmeler veri ambarlarından çok daha hızlı ve faydalı raporlar elde edebilmektedirler.…”

Section: Donanım Destekli Veritabanlarıunclassified

Veri Saklama Yöntem ve Uygulamaları

Akadal¹

2021

Tıp Bilişimi

View full text Add to dashboard Cite

show abstract

“…Many recent studies consider improving the speed of lossless decompression. The study in [12] discusses some of the prior work in the context of databases. To address block boundary problems, [19] explores the blocklevel parallelism by performing pattern matching on the delimiters to predict the block boundaries.…”

Section: Related Workmentioning

confidence: 99%

An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

Fang

Chen

Lee

et al. 2020

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a "refine and recycle" method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write commands (for literal tokens) and read commands (for copy tokens) to a set of commands that target a single bank of block ram, and rather than performing all the dependency calculations saves logic by recycling (read) commands that return with an invalid result. A single "Snappy" decompressor implemented in reconfigurable logic leveraging this method is capable of processing multiple literal or copy tokens per cycle and achieves up to 7.2GB/s, which can keep pace with an NVMe device. The proposed method is about an order of magnitude faster and an order of magnitude more power efficient than a state-of-the-art single-core software implementation. The logic and block ram resources required by the decompressor are sufficiently low so that a set of these decompressors can be implemented on a single FPGA of reasonable size to keep up with the bandwidth provided by the most recent interface technologies.

show abstract

In-memory database acceleration on FPGAs: a survey

Cited by 69 publications

References 104 publications

Efficient local locking for massively multithreaded in-memory hash-based operators

Efficient local locking for massively multithreaded in-memory hash-based operators

Veri Saklama Yöntem ve Uygulamaları

An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

Contact Info

Product

Resources

About