What Should We Teach in Information Retrieval?

Markov, Ilia; Rijke, Maarten de

doi:10.1145/3308774.3308780

Cited by 8 publications

(2 citation statements)

References 81 publications

(103 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Information retrieval (IR) evolved continuously for four decades (Robertson and Jones, 1976; Manning et al , 2008; Baeza-Yates and Ribeiro-Neto, 2011; Markov and de Rijke, 2019) from symbolic to vectorized representation, text transformation and analysis, offline and online treatment, etc. Among this field, ad hoc search aims to bring forward documents contained in a corpus related to a given query, which summarizes the expected features of common search engines nowadays.…”

Section: Introductionmentioning

confidence: 99%

LoGE: an unsupervised local-global document extension generation in information retrieval for long documents

Ayoub,

Rodrigues,

Travers

2023

IJWIS

View full text Add to dashboard Cite

Purpose This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains. Design/methodology/approach To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation. Findings The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results. Originality/value In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.

show abstract

Section: Introductionmentioning

confidence: 99%

LoGE: an unsupervised local-global document extension generation in information retrieval for long documents

Ayoub,

Rodrigues,

Travers

2023

IJWIS

View full text Add to dashboard Cite

show abstract

“…Online ranker evaluation concerns the task of determining the ranker with the best performance out of a finite set of rankers. It is an important challenge for information retrieval systems [21,29,30]. In the absence of an oracle judge who can tell the preferences between all rankers, the best ranker is usually inferred from user feedback on the result lists produced by the rankers [16].…”

Section: Introductionmentioning

confidence: 99%

MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation

Li¹,

Markov²,

Rijke³

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaved comparison methods, how to effectively choose the pair of rankers to generate the result list without degrading the user experience too much can be formalized as a K-armed dueling bandit problem, which is an online partial-information learning framework, where feedback comes in the form of pair-wise preferences. A commercial search system may evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers has not been fully studied.In this paper, we focus on solving the large-scale online ranker evaluation problem under the so-called Condorcet assumption, where there exists an optimal ranker that is preferred to all other rankers. We propose Merge Double Thompson Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that localizes the comparisons carried out by the algorithm to small batches of rankers, and then employs the Thompson Sampling (TS) to reduce the comparisons between suboptimal rankers inside these small batches. The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search. Our main finding is that for large-scale Condorcet ranker evaluation problems MergeDTS outperforms the state-of-the-art dueling bandit algorithms.CCS Concepts: • Information systems → Evaluation of retrieval results; Learning to rank.

show abstract