Tree-Encoded Bitmaps

Lang, Harald; Beischl, Alexander; Leis, Viktor; Boncz, Peter; Neumann, Thomas; Kemper, Alfons

doi:10.1145/3318464.3380588

Cited by 11 publications

(8 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While we have designed CI for immutable environments, using an update-friendly bitmap encoding such as Roaring or Tree-Encoded Bitmaps [27] CI could support updates and deletions since both operations only require updating the bitmap part of the index. To support inserts, we would also need to update the Cuckoo table.…”

Section: Discussionmentioning

confidence: 99%

Cuckoo index

Kipf¹,

Chromejko

Hall

et al. 2020

Proc. VLDB Endow.

Self Cite

View full text Add to dashboard Cite

In modern data warehousing, data skipping is essential for high query performance. While index structures such as B-trees or hash tables allow for precise pruning, their large storage requirements make them impractical for indexing secondary columns. Therefore, many systems rely on approximate indexes such as min/max sketches (ZoneMaps) or Bloom filters for cost-effective data pruning. For example, Google PowerDrill skips more than 90% of data on average using such indexes. In this paper, we introduce Cuckoo Index (CI), an approximate secondary index structure that represents the many-to-many relationship between keys and data partitions in a highly space-efficient way. At its core, CI associates variable-sized fingerprints in a Cuckoo filter with compressed bitmaps indicating qualifying partitions. With our approach, we target equality predicates in a read-only (immutable) setting and optimize for space efficiency under the premise of practical build and lookup performance. In contrast to per-partition (Bloom) filters, CI produces correct results for lookups with keys that occur in the data. CI allows to control the ratio of false positive partitions for lookups with non-occurring keys. Our experiments with real-world and synthetic data show that CI consumes significantly less space than per-partition filters for the same pruning power for low-to-medium cardinality columns. For high cardinality columns, CI is on par with its baselines.

show abstract

Section: Discussionmentioning

confidence: 99%

Cuckoo index

Kipf¹,

Chromejko

Hall

et al. 2020

Proc. VLDB Endow.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Indexing is an important and well-studied problem in data management and recent works have utilized machine learning to learn a CDF or to partition the data space for traditional database indexing [17,19,40,42,43,51,56,71]. In this paper, we complement recent work by studying the applicability of machine learning techniques to assist index construction for set similarity search problems.…”

Section: Related Workmentioning

confidence: 97%

LES3: Learning-based Exact Set Similarity Search

Li,

Yu,

Koudas

2021

Preprint

View full text Add to dashboard Cite

Set similarity search is a problem of central interest to a wide variety of applications such as data cleaning and web search. Past approaches on set similarity search utilize either heavy indexing structures, incurring large search costs or indexes that produce large candidate sets. In this paper, we design a learning-based exact set similarity search approach, LES 3 . Our approach first partitions sets into groups, and then utilizes a light-weight bitmap-like indexing structure, called token-group matrix (TGM), to organize groups and prune out candidates given a query set. In order to optimize pruning using the TGM, we analytically investigate the optimal partitioning strategy under certain distributional assumptions. Using these results, we then design a learning-based partitioning approach called L2P and an associated data representation encoding, PTR, to identify the partitions. We conduct extensive experiments on real and synthetic datasets to fully study LES 3 , establishing the effectiveness and superiority over other applicable approaches.

show abstract

“…In this section, we discuss the basic workings of TEBs; how they are constructed, navigated, and how they are currently updated. For more details about TEBs, we refer the reader to [6].…”

Section: Tree-encoded Bitmapsmentioning

confidence: 99%

“…The Tree-Encoded Bitmap (TEB) [6] is a novel bitmap compression scheme that represents bitmaps as binary trees.…”

Section: Introductionmentioning

confidence: 99%

In-Place Updates in Tree-Encoded Bitmaps

Saputra

Zacharatou

Papadias

et al. 2022

34th International Conference on Scientific and Statistical Database Management

View full text Add to dashboard Cite

The Tree-Encoded Bitmap (TEB) is a tree-based bitmap compression scheme that maps runs in a bitmap to leaf nodes in a binary tree.Currently, TEBs perform updates using an auxiliary differential data structure. However, consulting this additional data structure at every read introduces both memory and read overheads. To mitigate the shortcomings of differential updates, we propose algorithms to update TEBs in place. To that end, we classify the updates that can occur in a TEB into two types: run-forming and run-breaking.Run-forming updates correspond to leaf nodes at the lowest level of the binary tree. All other updates are run-breaking. Each type of update requires different handling. Through experimentation with synthetic data, we determined that in-place run-forming updates are 2-3× faster than differential updates, while run-breaking updates cannot be efficiently performed in place. Therefore, we propose a hybrid solution that performs run-forming updates in place and stores run-breaking updates in a differential data structure. Our experiments with synthetic data show that our hybrid solution performs updates faster than the differential approach. For example, for a workload where 20% of the updates are run forming, our hybrid solution is 69% faster on average.

show abstract

Tree-Encoded Bitmaps

Cited by 11 publications

References 55 publications

Cuckoo index

Cuckoo index

LES3: Learning-based Exact Set Similarity Search

In-Place Updates in Tree-Encoded Bitmaps

Contact Info

Product

Resources

About