A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

Mallia, Antonio; Siedlaczek, Michał; Sun, Mengyang; Suel, Torsten

doi:10.1145/3340531.3412080

Cited by 10 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are also related strategies to obtain an accurate top-𝑘 threshold earlier, e.g. [31,33,35,37,42]. While we can benefit from these studies, this paper does not study them because they represent orthogonal optimizations.…”

Section: Background and Related Workmentioning

confidence: 99%

Dual Skipping Guidance for Document Retrieval with Learned Sparse Representations

Qiao¹,

Yang²,

Lin³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper proposes a dual skipping guidance scheme with hybrid scoring to accelerate document retrieval that uses learned sparse representations while still delivering a good relevance. This scheme uses both lexical BM25 and learned neural term weights to bound and compose the rank score of a candidate document separately for skipping and final ranking, and maintains two top-𝑘 thresholds during inverted index traversal. This paper evaluates time efficiency and ranking relevance of the proposed scheme in searching MS MARCO TREC datasets.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Dual Skipping Guidance for Document Retrieval with Learned Sparse Representations

Qiao¹,

Yang²,

Lin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, de Carvalho et al (2015) store the values of the kth highest scores in each postings list for certain values of k. Initializing to the largest (across the terms) of the kth largest (across documents) contributions C(t, d) is then safe, because for the given query there must be at least k similarity scores greater than or equal to that value. Kane and Tompa (2018), Yafay andAltingovde (2019), andPetri et al (2019) have explored similar options, and Mallia et al, (2020) compare a range of such initializations.…”

Section: Introductionmentioning

confidence: 99%

Efficient query processing techniques for next-page retrieval

2022

View full text Add to dashboard Cite

In top-k ranked retrieval the goal is to efficiently compute an ordered list of the highest scoring k documents according to some stipulated similarity function such as the well-known BM25 approach. In most implementation techniques a min-heap of size k is used to track the top scoring candidates. In this work we consider the question of how best to retrieve the second page of search results, given that a first page has already been computed; that is, identification of the documents at ranks $$k+1$$ k + 1 to 2k for some query. Our goal is to understand what information is available as a by-product of the first-page scoring, and how it can be employed to accelerate the second-page computation, assuming that the second-page of results is required for only a fraction of the query load. We propose a range of simple, yet efficient, next-page retrieval techniques which are suitable for accelerating Document-at-a-Time mechanisms, and demonstrate their performance on three large text collections.

show abstract

“…A second strand of development has pursued non-safe approaches, including the quit/continue heuristics [51]; and approximations that deliberately over-estimate the current heap entry threshold (when maintaining the top-𝑘 set) [10,14,16,42,64]. In related work, researchers have sought bounds on the query's final 𝑘 th largest document score [19,29,73], or to provide a conservative estimate of it [49,57], seeking to bypass fruitless early work when dynamic pruning mechanisms are in play.…”

Section: Introductionmentioning

confidence: 99%

Anytime Ranking on Document-Ordered Indexes

Mackenzie¹,

Petri²,

Moffat³

2021

Preprint

View full text Add to dashboard Cite

Inverted indexes continue to be a mainstay of text search engines, allowing efficient querying of large document collections. While there are a number of possible organizations, document-ordered indexes are the most common, since they are amenable to various query types, support index updates, and allow for efficient dynamic pruning operations. One disadvantage with document-ordered indexes is that high-scoring documents can be distributed across the document identifier space, meaning that index traversal algorithms that terminate early might put search effectiveness at risk. The alternative is impact-ordered indexes, which primarily support top-𝑘 disjunctions, but also allow for anytime query processing, where the search can be terminated at any time, with search quality improving as processing latency increases. Anytime query processing can be used to effectively reduce high-percentile tail latency which is essential for operational scenarios in which a service level agreement (SLA) imposes response time requirements. In this work, we show how document-ordered indexes can be organized such that they can be queried in an anytime fashion, enabling strict latency control with effective early termination. Our experiments show that processing document-ordered topical segments selected by a simple score estimator outperforms existing anytime algorithms, and allows query runtimes to be accurately limited in order to comply with SLA requirements.

show abstract

A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

Cited by 10 publications

References 20 publications

Dual Skipping Guidance for Document Retrieval with Learned Sparse Representations

Dual Skipping Guidance for Document Retrieval with Learned Sparse Representations

Efficient query processing techniques for next-page retrieval

Anytime Ranking on Document-Ordered Indexes

Contact Info

Product

Resources

About