Brian Kroth scite author profile

The recent success of machine learning (ML) has led to an explosive growth of systems and applications built by an ever-growing community of system builders and data science (DS) practitioners. This quickly shifting panorama, however, is challenging for system builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, performing the largest analysis of DS projects to date, focusing on questions that can advance our understanding of the field and determine investments. Specifically, we download and analyze (a) over 8M notebooks publicly available on GITHUB and (b) over 2M enterprise ML pipelines developed within Microsoft. Our analysis includes coarse-grained statistical characterizations, finegrained analysis of libraries and pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret and draw actionable conclusions on (a) what system builders should focus on to better serve practitioners and (b) what technologies should practitioners rely on.

show abstract

Mlos

Curino¹,

Godwal²,

Kroth³

et al. 2020

View full text Add to dashboard Cite

Optimizing databases by learning hidden parameters of solid state drives

Kakaraparthy¹,

Patel²,

Park

et al. 2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

Solid State Drives (SSDs) are complex devices with varying internal implementations, resulting in subtle differences in behavior between devices. In this paper, we demonstrate how a database engine can be optimized for a particular device by learning its hidden parameters. This can not only improve an application's performance, but also potentially increase the lifetime of the SSD. Our approach for optimizing a database for a given SSD consists of three steps: learning the hidden parameters of the device, proposing rules to analyze the I/O behavior of the database, and optimizing the database by eliminating violations of these rules. We obtain two different characteristics of an SSD, namely the request size profile and the location profile , from which we learn multiple internal parameters. Based on these parameters, we propose rules to analyze the I/O behavior of a database engine. Using these rules, we uncover sub-optimal I/O patterns in SQLite3 and MariaDB when running on our experimental SSDs. Finally, we present three techniques to optimize these database engines: (1) use-hot-locations on SSD-S, which improves the SELECT operation throughput of SQLite3 and MariaDB by 29% and 27% respectively; it also improves the performance of YCSB on MariaDB by 1%-22% depending on the workload mix, (2) write-aligned-stripes on SSD-T, reduces the wear-out caused by SQLite3 write-ahead log (WAL) file by 3.1%, and (3) contain-write-in-flash-page on SSD-T, which reduces the wear-out caused by the MariaDB binary log file by 6.7%.

show abstract

LlamaTune: Sample-Efficient DBMS Configuration Tuning

Konstantinos¹,

Ding²,

Kroth³

et al. 2022

Preprint

View full text Add to dashboard Cite

Tuning a database system to achieve optimal performance on a given workload is a long-standing problem in the database community. A number of recent papers have leveraged ML-based approaches to guide the sampling of large parameter spaces (hundreds of tuning knobs) in search for high performance configurations. Looking at Microsoft production services operating millions of databases, sample efficiency emerged as a crucial requirement to use tuners on diverse workloads.This motivates our investigation in LlamaTune, a system that leverages two key insights: 1) an automated dimensionality reduction technique based on randomized embeddings, and 2) a biased sampling approach to handle special values for certain tuning knobs. LlamaTune compares favorably with the state-of-the-art optimizers across a diverse set of workloads achieving the best performing configurations with up to 11× fewer workload runs, and reaching up to 21% higher throughput. We also show that benefits from LlamaTune generalizes across random-forest and Gaussian Processbased Bayesian optimizers. While the journey to perform database tuning at cloud-scale remains long, LlamaTune goes a long way in making automatic DB tuning practical at scale.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Brian Kroth

Lessons learned from the early performance evaluation of Intel optane DC persistent memory in DBMS

Data Science Through the Looking Glass

Mlos

Optimizing databases by learning hidden parameters of solid state drives

LlamaTune: Sample-Efficient DBMS Configuration Tuning

Contact Info

Product

Resources

About