The PGM-index

Ferragina, Paolo; Vinciguerra, Giorgio

doi:10.14778/3389133.3389135

Cited by 161 publications

(30 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recursive model indexes (RMIs) are one such class of models [8] (although others [3][4][5]11] exist as well) , combining simpler machine learning models together into a multistaged structure. For example, as depicted in Figure 1, an RMI with two stages, a linear stage and a cubic stage, would first use a linear model to make an initial prediction of an index for a specific key (stage 1).…”

Section: Introductionmentioning

confidence: 99%

CDFShop: Exploring and Optimizing Learned Index Structures

Marcus

Zhang

Kraska

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Indexes are a critical component of data management applications. While tree-like structures (e.g., B-Trees) have been employed to great success, recent work suggests that index structures powered by machine learning models (learned index structures) can achieve low lookup times with a reduced memory footprint. This demonstration showcases CDFShop, a tool to explore and optimize recursive model indexes (RMIs), a type of learned index structure. This demonstration allows audience members to (1) gain an intuition about various tuning parameters of RMIs and why learned index structures can greatly accelerate search, and (2) understand how automatic optimization techniques can be used to explore space/time tradeoffs within the space of RMIs.

show abstract

Section: Introductionmentioning

confidence: 99%

CDFShop: Exploring and Optimizing Learned Index Structures

Marcus

Zhang

Kraska

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…The motivation is that an index can be seen as a function mapping a search key to the storage position of the corresponding record. Several follow-up studies propose learned indexes for one-dimensional data [17,20,69]. More details can be found in a benchmark study [41].…”

Section: Training Timementioning

confidence: 99%

WISK: A Workload-aware Learned Index for Spatial Keyword Queries

Sheng¹,

Cao²,

Fang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8× speedup in querying time with comparable storage overhead.

show abstract

“…IFB-tree [37] evaluates the update cost with interpolation-friendliness, such as a partition in uniform distribution with higher interpolation-friendliness. PGM-index [38] admits a streaming algorithm to partition, instead of using FITing-tree's greedy algorithm, and handles updates using LSM-tree. Shift-table [39] resolves the local biases of learned models at the cost of (at most) one memory lookup.…”

Section: Learned Indicesmentioning

confidence: 99%

SLBRIN: A Spatial Learned Index Based on BRIN

Wang

et al. 2023

IJGI

View full text Add to dashboard Cite

The spatial learned index constructs a spatial index by learning the spatial distribution, which performs a lower cost of storage and query than the spatial indices. The current update strategies of spatial learned indices can only solve limited updates at the cost of query performance. We propose a novel spatial learned index structure based on a Block Range Index (SLBRIN for short). Its core idea is to cooperate history range and current range to satisfy a fast spatial query and efficient index update simultaneously. SLBRIN deconstructs the update transaction into three parallel operations and optimizes them based on the temporal proximity of spatial distribution. SLBRIN also provides the spatial query strategy with the spatial learned index and spatial location code, including point query, range query and kNN query. Experiments on synthetic and real datasets demonstrate that SLBRIN clearly outperforms traditional spatial indices and state-of-the-art spatial learned indices in the cost of storage and query. Moreover, in the simulated real-time update scenario, SLBRIN has the faster and more stable query performance while satisfying efficient updates.

show abstract

The PGM-index

Cited by 161 publications

References 29 publications

CDFShop: Exploring and Optimizing Learned Index Structures

CDFShop: Exploring and Optimizing Learned Index Structures

WISK: A Workload-aware Learned Index for Spatial Keyword Queries

SLBRIN: A Spatial Learned Index Based on BRIN

Contact Info

Product

Resources

About