Two-way replacement selection

Martinez-Palau, Xavier; Domínguez-Sal, David; Larriba-Pey, Josep Lluís

doi:10.14778/1920841.1920952

Cited by 7 publications

(12 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Every pair element that is ordered in this process is never swapped in future queries, and thus, the number of swaps is reduced. The above approach of buffered crack-intwo is similar to [21], where two heaps are used to improve the stability of the replacement selection algorithm. By adjusting the maximal heap size in buffered crack-in-two, we can tune the convergence speed of the cracked index.…”

Section: Buffered Swapping Instead Of Swapping Elements Immediately Amentioning

confidence: 99%

An experimental evaluation and analysis of database cracking

Schuhknecht

Jindal²,

Dittrich

2015

The VLDB Journal

View full text Add to dashboard Cite

Database cracking has been an area of active research in recent years. The core idea of database cracking is to create indexes adaptively and incrementally as a side product of query processing. Several works have proposed different cracking techniques for different aspects including updates, tuple reconstruction, convergence, concurrency control, and robustness. Our 2014 VLDB paper "The Uncracked Pieces in Database Cracking" (PVLDB 7:97-108, 2013/VLDB 2014 was the first comparative study of these different methods by an independent group. In this article, we extend our published experimental study on database cracking and bring it to an up-to-date state. Our goal is to critically review several aspects, identify the potential, and propose promising directions in database cracking. With this study, we hope to expand the scope of database cracking and possibly leverage cracking in database engines other than MonetDB. We repeat several prior database cracking works including the core cracking algorithms as well as three other works on convergence (hybrid cracking), tuple reconstruction (sideways cracking), and robustness (stochastic cracking), respectively. Additionally to our conference paper, we now also look at a recently published study about CPU efficiency (predication cracking). We evaluate these works and show possible directions to do even better. As a further extension, we evaluate the whole class of parallel cracking algorithms that were proposed in three recent works. Altogether, in this work we revisit 8 papers on database cracking and evaluate in total 18 cracking methods, 6 sorting algorithms, B Felix Martin Schuhknecht felix.schuhknecht@infosys.uni-saarland.de 1 Information Systems Group, Saarland University, Saarbrücken, Germany 2 CSAIL, MIT, Cambridge, MA, USA and 3 full index structures. Additionally, we test cracking under a variety of experimental settings, including high selectivity (Low selectivity means that many entries qualify. Consequently, a high selectivity means, that only few entries qualify) queries, low selectivity queries, varying selectivity, and multiple query access patterns. Finally, we compare cracking against different sorting algorithms as well as against different main memory optimized indexes, including the recently proposed adaptive radix tree (ART). Our results show that: (1) the previously proposed cracking algorithms are repeatable, (2) there is still enough room to significantly improve the previously proposed cracking algorithms, (3) parallelizing cracking algorithms efficiently is a hard task, (4) cracking depends heavily on query selectivity, (5) cracking needs to catch up with modern indexing trends, and (6) different indexing algorithms have different indexing signatures.

show abstract

Section: Buffered Swapping Instead Of Swapping Elements Immediately Amentioning

confidence: 99%

An experimental evaluation and analysis of database cracking

Schuhknecht

Jindal²,

Dittrich

2015

The VLDB Journal

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

“…The run-generation problem has been studied in its various guises for over 50 years [14,[17][18][19]25,30,31,34].…”

Section: Introductionmentioning

confidence: 99%

“…Martinez-Palau et al [34] revive this idea in an experimental study. Their two-way-replacement-selection algorithms heuristically choose between whether the run generation should go up or down.…”

mentioning

confidence: 99%

“…External-memory sorting algorithms are tailored for data sets too large to fit in main memory. Generally, these algorithms begin their sort by bringing chunks of data into main memory, sorting within memory, and writing back out to disk in sorted sequences, called runs [15,19,26,34].We revisit the classic problem of how to maximize the length of these runs, the run-generation problem. The run-generation problem has been studied in its various guises for over 50 years [14,[17][18][19]25,30,31,34].…”

mentioning

confidence: 99%

See 2 more Smart Citations

Run Generation Revisited: What Goes Up May or May Not Come Down

Bender

McCauley

McGregor

et al. 2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In this paper, we revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M , and output runs (contiguously sorted chunks of elements) that are as long as possible.We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting.• We analyze alternating-up-down replacement selection (runs alternate between sorted and reverse sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is asymptotically optimal. Specifically, we show that alternating-up-down replacement selection is 2-competitive and no deterministic online algorithm can perform better.• We give online algorithms having smaller competitive ratios with resource augmentation. Specifically, we exhibit a deterministic algorithm that, when given a buffer of size 4M , is able to match or beat any optimal algorithm having a buffer of size M . Furthermore, we present a randomized online algorithm which is 7/4-competitive when given a buffer twice that of the optimal.• We demonstrate that performance can also be improved with a small amount of foresight. We give an algorithm, which is 3/2-competitive, with foreknowledge of the next 3M elements of the input stream. For the extreme case where all future elements are known, we design a PTAS for computing the optimal strategy a run generation algorithm must follow.• We present algorithms tailored for "nearly sorted" inputs which are guaranteed to have optimal solutions with sufficiently long runs. External-memory sorting algorithms are tailored for data sets too large to fit in main memory. Generally, these algorithms begin their sort by bringing chunks of data into main memory, sorting within memory, and writing back out to disk in sorted sequences, called runs [15,19,26,34].We revisit the classic problem of how to maximize the length of these runs, the run-generation problem. The run-generation problem has been studied in its various guises for over 50 years [14,[17][18][19]25,30,31,34].The most well-known external-memory sorting algorithm is multi-way merge sort [1,8,15,22,28,29,40,42,44]. The multi-way merge sort is formalized in the disk-access machine 1 (DAM) model of Aggarwal and Vitter [1]. If M is the size of RAM and data is transferred between main memory and disk in blocks of size B, then an M/B-way merge sort has a complexity of O (N/B) log M/B (N/B) I/Os, where N is the number of elements to be sorted. This is the best possible [1].A top-down description of multi-way merge sort follows. Divide the input into M/B subproblems, recursively sort each subproblem, and merge them together in one final scan through the input. The base case is reached when each subproblem has size O(M ), and therefore fit...

show abstract

A scalable, high-performance customized priority queue

Huang

Lim

Cong

2014

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

Priority queues are abstract data structures where each element is associated with a priority, and the highest priority element is always retrieved first from the queue. The data structure is widely used within databases, including the last stage of a merge-sort, forecasting read-ahead I/O to stream data for the merge-sort, and replacement selection sort. Typical software implementations use a balanced binary tree-based structure, providing O(log N) time for both enqueue and dequeue operations.To improve the performance, we propose several scalable and high-speed FPGA-based implementations of a priority queue. Our insight is that the above listed applications primarily use priority queues through "replace" operations, which remove the highest priority element and place a new element into the queue. Thus, our designs are customized for this operation, allowing for a simple and scalable architecture. We implement three priority queue designs, including use of a register-based array, register-based tree, and BRAM-based tree, which have different benefits and trade-offs of throughput, frequency, and maximum size. More importantly, all designs achieve O(1) time between replace operations.To incorporate the best aspects of our designs, we propose a Hybrid Priority Queue (H-PQ), which combines a register-based array with multiple BRAM-based trees. This design provides, on average, very fast access times to the top items in the queue (through the register-based array), while scaling to large priority queue sizes (through the BRAM-based trees). In our evaluations, we find that H-PQ achieves 4.3x speedup and 21.5x energy efficiency, compared with the Xeon CPU implementations.

show abstract

Two-way replacement selection

Cited by 7 publications

References 7 publications

An experimental evaluation and analysis of database cracking

An experimental evaluation and analysis of database cracking

Run Generation Revisited: What Goes Up May or May Not Come Down

A scalable, high-performance customized priority queue

Contact Info

Product

Resources

About