The Case for Learned Index Structures

Kraska, Tim; Beutel, Alex; H., Ed; Dean, J. Michael; Polyzotis, Neoklis

doi:10.1145/3183713.3196909

Cited by 677 publications

(675 citation statements)

References 59 publications

Supporting

Mentioning

620

Contrasting

Unclassified

Order By: Relevance

“…In addition, a recent paper featuring learned indexes [24] discusses the cases of using complex machine learning models such as neural networks and multivariate regression models to predict locations of keys. As opposed to learned indexes, Hermit models the correlation between two columns and leverages the curve-fitting technique to adaptively create simple yet customized ML models for different regions (TRS-Tree tree nodes).…”

Section: D3 Complex Machine Learning Modelsmentioning

confidence: 99%

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Tian

et al. 2019

Proceedings of the 2019 International Conference on Management of Data

View full text Add to dashboard Cite

Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely space consuming, causing significant performance degradation due to the potential exhaustion of memory space. In this paper, we demonstrate that there exist many opportunities to exploit column correlations for accelerating data access. We propose Hermit, a succinct secondary indexing mechanism for modern RDBMSs. Hermit judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access. Instead of building a complete index that stores every single entry in the key columns, Hermit navigates any incoming key access queries to an existing index built on the correlated columns. This is achieved through the Tiered Regression Search Tree (TRS-Tree), a succinct, ML-enhanced data structure that performs fast curve fitting to adaptively and dynamically capture both column correlations and outliers. We have developed Hermit in two different RDBMSs, * Work done during an internship at IBM Research -Almaden.

show abstract

Section: D3 Complex Machine Learning Modelsmentioning

confidence: 99%

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Tian

et al. 2019

Proceedings of the 2019 International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…For example, the nearest neighbor interpolation of a point is equivalent to allocating indices of one to its neighbor and then map the value of the point. In this sense, indices are models [24], therefore indices can be modeled and learned. In this work, we model indices as a function of the local feature map and learn an index function to perform upsampling within deep CNNs.…”

Section: Introductionmentioning

confidence: 99%

Indices Matter: Learning to Index for Deep Image Matting

Dai

Shen

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

197

228

View full text Add to dashboard Cite

We show that existing upsampling operators can be unified with the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can recover boundary details much better than other upsampling operators such as bilinear interpolation. By looking at the indices as a function of the feature map, we introduce the concept of learning to index, and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the pooling and upsampling operators, without the need of supervision. At the core of this framework is a flexible network module, termed IndexNet, which dynamically predicts indices given an input. Due to its flexibility, IndexNet can be used as a plug-in applying to any off-the-shelf convolutional networks that have coupled downsampling and upsampling stages.We demonstrate the effectiveness of IndexNet on the task of natural image matting where the quality of learned indices can be visually observed from predicted alpha mattes. Results on the Composition-1k matting dataset show that our model built on MobileNetv2 exhibits at least 16.1% improvement over the seminal VGG-16 based deep matting baseline, with less training data and lower model capacity. Code and models has been made available at: https://tinyurl.com/IndexNetV1. * Corresponding author.Figure 1: Alpha mattes of different models. From left to right, Deeplabv3+ [4], RefineNet [30], Deep Matting [49] and Ours.Bilinear upsampling fails to recover subtle details, but unpooling and our learned upsampling operator can produce much clear mattes with good local contrast.

show abstract

“…The performance of the algorithm should be bounded as a function of some measure of the oracle error, even though the algorithm is oblivious to this error. The ML advice model has in the past been applied to the ski rental problem [13,4], job scheduling [13,12] and online revenue maximization [11]; it has also been used to achieve theoretical and practical gains in streaming frequency estimation [5] and data structures [7]. Most relevant to this paper is prior work by [8] in which it was shown how the model can be applied to the online caching problem.…”

Section: Introductionmentioning

confidence: 99%

Near-Optimal Bounds for Online Caching with Machine Learned Advice

Rohatgi

2020

Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms

109

View full text Add to dashboard Cite

In the model of online caching with machine learned advice, introduced by Lykouris and Vassilvitskii, the goal is to solve the caching problem with an online algorithm that has access to next-arrival predictions: when each input element arrives, the algorithm is given a prediction of the next time when the element will reappear. The traditional model for online caching suffers from an Ωplog kq competitive ratio lower bound (on a cache of size k). In contrast, the augmented model admits algorithms which beat this lower bound when the predictions have low error, and asymptotically match the lower bound when the predictions have high error, even if the algorithms are oblivious to the prediction error. In particular, Lykouris and Vassilvitskii showed that there is a prediction-augmented caching algorithm with a competitive ratio of Op1`minp a η{opt, log kqq when the overall ℓ 1 prediction error is bounded by η, and opt is the cost of the optimal offline algorithm.The dependence on k in the competitive ratio is optimal, but the dependence on η{opt may be far from optimal. In this work, we make progress towards closing this gap. Our contributions are twofold. First, we provide an improved algorithm with a competitive ratio of Op1m inppη{optq{k, 1q log kq. Second, we provide a lower bound of Ωplog minppη{optq{pk log kq, kqq.

show abstract

The Case for Learned Index Structures

Cited by 677 publications

References 59 publications

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Indices Matter: Learning to Index for Deep Image Matting

Near-Optimal Bounds for Online Caching with Machine Learned Advice

Contact Info

Product

Resources

About