Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Hofstätter, Sebastian; Khattab, Omar; Althammer, Sophia; Sertkan, Mete; Hanbury, Allan

doi:10.48550/arxiv.2203.13088

Cited by 2 publications

(5 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As 𝜃 increases, terms with lower weights are filtered out; see Eq. (18). Thus, the relevance score between the query and the passage decreases.…”

Section: Performance Of Two-stage Retrievalmentioning

confidence: 98%

“…For a fair comparison, Table 6 only includes models that use the same baseline training strategy as ours. Thus, we exclude approaches that depend on other models for expansion [25,33,51], costly training techniques such as knowledge distillation [9,17,18,38,41,44], or special pretraining [11,20,34] (see Table 8 for more comparisons).…”

Section: Evaluation Of Single Model Fusionmentioning

confidence: 99%

“…Transformer-based bi-encoders have been widely used as first-stage retrievers for text retrieval. Compared to their multi-vector counterparts [12,18,24], single-vector representation learning approaches (with a few representative techniques listed in Table 1) are promising due to their good balance between effectiveness and efficiency.…”

Section: Introductionmentioning

confidence: 99%

“…For simplicity, we do not show term weights and use the original passage (which is judged as relevant to the query) as a reference.We observe that when 𝜃 = 0, there are more than 10 matching terms between the query and the passage. As 𝜃 increases, terms with lower weights are filtered out; see Eq (18)…”

mentioning

confidence: 99%

See 3 more Smart Citations

A Dense Representation Framework for Lexical and Semantic Matching

Lin¹,

Lin²

2022

Preprint

View full text Add to dashboard Cite

Lexical and semantic matching capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust than either alone. Prior work performs hybrid retrieval by conducting lexical and semantic text matching using different systems (e.g., Lucene and Faiss, respectively) and then fusing their model outputs. In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs). Our experiments show that DLRs can effectively approximate the original lexical representations, preserving effectiveness while improving query latency. Furthermore, we can combine dense lexical and semantic representations to generate dense hybrid representations (DHRs) that are more flexible and yield faster retrieval compared to existing hybrid techniques. Finally, we explore jointly training lexical and semantic representations in a single model and empirically show that the resulting DHRs are able to combine the advantages of each individual component. Our best DHR model is competitive with state-of-the-art single-vector and multi-vector dense retrievers in both in-domain and zero-shot evaluation settings. Furthermore, our model is both faster and requires smaller indexes, making our dense representation framework an attractive approach to text retrieval. Our code is available at https://github.com/castorini/dhr.

show abstract

“…As 𝜃 increases, terms with lower weights are filtered out; see Eq. (18). Thus, the relevance score between the query and the passage decreases.…”

Section: Performance Of Two-stage Retrievalmentioning

confidence: 98%

Section: Evaluation Of Single Model Fusionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

A Dense Representation Framework for Lexical and Semantic Matching

Lin¹,

Lin²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…These campaigns follow the Cranfield paradigm [9] to create relevance judgements on the pooled output of the participating systems. Recently there has been a growing interest in evaluating the retrieval performance of retrieval models for domain-specific retrieval tasks [2,13,14,27,36] including the medical domain [22,23,35]. Domain-specific retrieval tasks often lack a reliable test collection with human relevance judgments following the Cranfield paradigm [22,27].…”

Section: Introductionmentioning

confidence: 99%

TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

Althammer,

Hofstätter,

Verberne

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Robust test collections are crucial for Information Retrieval research. Recently there is a growing interest in evaluating retrieval systems for domain-specific retrieval tasks, however these tasks often lack a reliable test collection with human-annotated relevance assessments following the Cranfield paradigm. In the medical domain, the TripClick collection was recently proposed, which contains click log data from the Trip search engine and includes two click-based test sets. However the clicks are biased to the retrieval model used, which remains unknown, and a previous study shows that the test sets have a low judgement coverage for the Top-10 results of lexical and neural retrieval models. In this paper we present the novel, relevance judgement test collection TripJudge for TripClick health retrieval. We collect relevance judgements in an annotation campaign and ensure the quality and reusability of TripJudge by a variety of ranking methods for pool creation, by multiple judgements per query-document pair and by an at least moderate inter-annotator agreement. We compare system evaluation with TripJudge and TripClick and find that that click and judgement-based evaluation can lead to substantially different system rankings. CCS CONCEPTS• Information systems → Test collections.

show abstract

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Cited by 2 publications

References 48 publications

A Dense Representation Framework for Lexical and Semantic Matching

A Dense Representation Framework for Lexical and Semantic Matching

TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

Contact Info

Product

Resources

About