Active Learning for ML Enhanced Database Systems

Ma, Lin; Ding, Bailu; Das, Sudipto; Swaminathan, Adith

doi:10.1145/3318464.3389768

Cited by 48 publications

(13 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Input : E p , E s The current adjusting scheme is actually an empirical algorithm. Although our experiments suggest the effectiveness of the algorithm, it is more attractive to employ a machine-learning algorithm [29,30] to make AMG-Buffer learn the access pattern, which can be further used to adjust the size of the P-Buffer.…”

Section: Algorithm 3: Adjust the P-buffermentioning

confidence: 97%

Adaptive Multi-Grained Buffer Management for Database Systems

Wang

Jin

2021

Future Internet

View full text Add to dashboard Cite

The traditional page-grained buffer manager in database systems has a low hit ratio when only a few tuples within a page are frequently accessed. To handle this issue, this paper proposes a new buffering scheme called the AMG-Buffer (Adaptive Multi-Grained Buffer). AMG-Buffer proposes to use two page buffers and a tuple buffer to organize the whole buffer. In this way, the AMG-Buffer can hold more hot tuples than a single page-grained buffer. Further, we notice that the tuple buffer may cause additional read I/Os when writing dirty tuples into disks. Thus, we introduce a new metric named clustering rate to quantify the hot-tuple rate in a page. The use of the tuple buffer is determined by the clustering rate, allowing the AMG-Buffer to adapt to different workloads. We conduct experiments on various workloads to compare the AMG-Buffer with several existing schemes, including LRU, LIRS, CFLRU, CFDC, and MG-Buffer. The results show that AMG-Buffer can significantly improve the hit ratio and reduce I/Os compared to its competitors. Moreover, the AMG-Buffer achieves the best performance on a dynamic workload as well as on a large data set, suggesting its adaptivity and scalability to changing workloads.

show abstract

Section: Algorithm 3: Adjust the P-buffermentioning

confidence: 97%

Adaptive Multi-Grained Buffer Management for Database Systems

Wang

Jin

2021

Future Internet

View full text Add to dashboard Cite

show abstract

“…Recently, there has been significant interest in using machine learning for database tuning [12,19,20,25,27]. Our work falls into the same, broad category as it exploits RL.…”

Section: Related Workmentioning

confidence: 99%

UDO: Universal Database Optimization using Reinforcement Learning

Wang,

Trummer,

Basu

2021

Preprint

View full text Add to dashboard Cite

UDO is a versatile tool for offline tuning of database systems for specific workloads. UDO can consider a variety of tuning choices, reaching from picking transaction code variants over index selections up to database system parameter tuning. UDO uses reinforcement learning to converge to near-optimal configurations, creating and evaluating different configurations via actual query executions (instead of relying on simplifying cost models). To cater to different parameter types, UDO distinguishes heavy parameters (which are expensive to change, e.g. physical design parameters) from light parameters. Specifically for optimizing heavy parameters, UDO uses reinforcement learning algorithms that allow delaying the point at which the reward feedback becomes available. This gives us the freedom to optimize the point in time and the order in which different configurations are created and evaluated (by benchmarking a workload sample). UDO uses a cost-based planner to minimize reconfiguration overheads. For instance, it aims to amortize the creation of expensive data structures by consecutively evaluating configurations using them. We evaluate UDO on Postgres as well as MySQL and on TPC-H as well as TPC-C, optimizing a variety of light and heavy parameters concurrently.

show abstract

“…Sampling from discrete distributions can be achieved with (among others) inverse transform sampling, or the Gumbel-max trick [4] (see Section 4.1.1) and extensions thereof (see Section 4.3). Gumbel-based sampling algorithms have for example been used for (discrete) action selection in a multi-armed bandit setting [10], for sampling data points in active learning [11], for text generation in dialog systems [12] or in translation tasks [13], [14].…”

Section: Applicationsmentioning

confidence: 99%

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

Huijben¹,

Kool²,

Paulus³

et al. 2021

Preprint

View full text Add to dashboard Cite

The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by its unnormalized (log-)probabilities. Over the past years, the machine learning community has proposed several extensions of this trick to facilitate, e.g., drawing multiple samples, sampling from structured domains, or gradient estimation for error backpropagation in neural network optimization. The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection. Moreover, it presents a comprehensive outline of (machine learning) literature in which Gumbel-based algorithms have been leveraged, reviews commonly-made design choices, and sketches a future perspective.

show abstract

Active Learning for ML Enhanced Database Systems

Cited by 48 publications

References 44 publications

Adaptive Multi-Grained Buffer Management for Database Systems

Adaptive Multi-Grained Buffer Management for Database Systems

UDO: Universal Database Optimization using Reinforcement Learning

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

Contact Info

Product

Resources

About