2014
DOI: 10.1007/978-3-319-11918-2_3
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Compressed Indexing for Approximate Top-k String Retrieval

Abstract: Abstract. Given a collection of strings (called documents), the top-k document retrieval problem is that of, given a string pattern p, finding the k documents where p appears most often. This is a basic task in most information retrieval scenarios. The best current implementations require 20-30 bits per character (bpc) and k to 4k microseconds per query, or 12-24 bpc and 1-10 milliseconds per query. We introduce a Lempel-Ziv compressed data structure that occupies 5-10 bpc to answer queries in around k microse… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…Compared to the stronger lz77 format [72], lz78 has a more regular structure, which has made it the preferred choice for direct searching in compressed texts [2,32,33,42,44,45,57,59], implementing string dictionaries [7], compressed sequence representations supporting optimal-time access [67], compressed text indexes for pattern matching [4,27,66], and document retrieval [25,26].…”
Section: Introductionmentioning
confidence: 99%
“…Compared to the stronger lz77 format [72], lz78 has a more regular structure, which has made it the preferred choice for direct searching in compressed texts [2,32,33,42,44,45,57,59], implementing string dictionaries [7], compressed sequence representations supporting optimal-time access [67], compressed text indexes for pattern matching [4,27,66], and document retrieval [25,26].…”
Section: Introductionmentioning
confidence: 99%
“…Its variants (especially LZW [17]) are used in software like Unix's Compress and formats like GIF. Compared to the stronger LZ77 format [18], LZ78 has a more regular structure, which has made it the preferred choice for compressed sequence representations supporting optimaltime access [16] and compressed text indexes for pattern matching [7,15,3] and document retrieval [5,6].…”
Section: Introductionmentioning
confidence: 99%