Yinan Li scite author profile

The focus of this paper is on investigating efficient hash join algorithms for modern multi-core processors in main memory environments. This paper dissects each internal phase of a typical hash join algorithm and considers different alternatives for implementing each phase, producing a family of hash join algorithms. Then, we implement these main memory algorithms on two radically different modern multiprocessor systems, and carefully examine the factors that impact the performance of each method.Our analysis reveals some interesting results -a very simple hash join algorithm is very competitive to the other more complex methods. This simple join algorithm builds a shared hash table and does not partition the input relations. Its simplicity implies that it requires fewer parameter settings, thereby making it far easier for query optimizers and execution engines to use it in practice. Furthermore, the performance of this simple algorithm improves dramatically as the skew in the input data increases, and it quickly starts to outperform all other algorithms. Based on our results, we propose that database implementers consider adding this simple join algorithm to their repertoire of main memory join algorithms, or adapt their methods to mimic the strategy employed by this algorithm, especially when joining inputs with skewed data distributions.

show abstract

SRRM4 Drives Neuroendocrine Transdifferentiation of Prostate Adenocarcinoma Under Androgen Receptor Pathway Inhibition

Donmez

et al. 2017

View full text Add to dashboard Cite

ALEX: An Updatable Adaptive Learned Index

et al. 2020

View full text Add to dashboard Cite

Machine learning prediction of biochar yield and carbon contents in biochar based on biomass characteristics and pyrolysis conditions

Zhu

Wang

2019

Bioresource Technology

269

106

View full text Add to dashboard Cite

Tree indexing on solid state drives

Yang

et al. 2010

Proc. VLDB Endow.

143

View full text Add to dashboard Cite

Large flash disks, or solid state drives (SSDs), have become an attractive alternative to magnetic hard disks, due to their high random read performance, low energy consumption and other features. However, writes, especially small random writes, on flash disks are inherently much slower than reads because of the erase-beforewrite mechanism.To address this asymmetry of read-write speeds in tree indexing on the flash disk, we propose FD-tree, a tree index designed with the logarithmic method and fractional cascading techniques. With the logarithmic method, an FD-tree consists of the head tree -a small B+-tree on the top, and a few levels of sorted runs of increasing sizes at the bottom. This design is write-optimized for the flash disk; in particular, an index search will potentially go through more levels or visit more nodes, but random writes are limited to a small area -the head tree, and are subsequently transformed into sequential ones through merging into the lower runs. With the fractional cascading technique, we store pointers, called fences, in lower level runs to speed up the search. Given an FD-tree of n entries, we analytically show that it performs an update in O(log B n) sequential I/Os and completes a search in O(log B n) random I/Os, where B is the flash page size. We evaluate FD-tree in comparison with representative B+-tree variants under a variety of workloads on three commodity flash SSDs. Our results show that FD-tree has a similar search performance to the standard B+-tree, and a similar update performance to the write-optimized B+-tree variant. As a result, FD-tree dominates the other B+-tree index variants on the overall performance on flash disks as well as on magnetic disks.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yinan Li

Design and evaluation of main memory hash join algorithms for multi-core CPUs

SRRM4 Drives Neuroendocrine Transdifferentiation of Prostate Adenocarcinoma Under Androgen Receptor Pathway Inhibition

ALEX: An Updatable Adaptive Learned Index

Machine learning prediction of biochar yield and carbon contents in biochar based on biomass characteristics and pyrolysis conditions

Tree indexing on solid state drives

Contact Info

Product

Resources

About