Simplified Data Wrangling with ir_datasets

MacAvaney, Sean; Yates, Andrew; Feldman, Sergey; Downey, Doug; Cohan, Arman; Goharian, Nazli

doi:10.1145/3404835.3463254

Cited by 61 publications

(29 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, we ask the question: what if we can afford to translate MS MARCO so that we can use a translate-train model? To investigate, we utilize the Chinese translation of the MSMARCO-v1 training triples from ColBERT-X [32], which can also be accessed via ir_datasets [30] with the dataset key neumarco/zh 1 . Figure 2 shows that without C3, the ColBERT model improves from 0.352 to 0.421, which is still worse than zero-shot transfer models trained with C3 for CLIR, suggesting allocating effort to C3 rather than training a translation model when computational resources are limited.…”

Section: Results and Analysismentioning

confidence: 99%

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Yang,

Nair,

Chandradevan

et al. 2022

Preprint

View full text Add to dashboard Cite

Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness. CCS CONCEPTS• Information systems → Retrieval models and ranking.

show abstract

Section: Results and Analysismentioning

confidence: 99%

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Yang,

Nair,

Chandradevan

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For TREC graded relevance (0 = non relevant to 3 = perfect), we use the recommended binarization point of 2 for the recall metric. For out of domain experiments we refer to the ir_datasets catalogue [37] for collection specific information, as we utilized the standardized test sets for the collections.…”

Section: Passage Collection and Query Setsmentioning

confidence: 99%

“…Methodology. We selected seven datasets from the ir_datasets catalogue [37]: Bio medical (TREC Covid [50,52], TripClick [40], NFCorpus [4]), Entity centric (DBPedia Entity [14]), informal language (Antique [13], TREC Podcast [23]), news cables (TREC Robust 04 [49]). The datasets are not based on web collections, have at least 50 queries, and importantly contain judgements from both relevant and non-relevant categories.…”

Section: Out-of-domain Robustnessmentioning

confidence: 99%

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Hofstätter¹,

Khattab²,

Althammer³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multivector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5×, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality outof-domain collections, yielding statistically significant gains over traditional retrieval baselines.does doxycycline contain sulfa BERT tokenized (9 subword-tokens): 'does', 'do', '##xy', '##cy', '##cl', '##ine', 'contain', 'sul', '##fa'ColBERTer BOW 2 (30 saved vectors from 84 subword-tokens): photosensitivity doxycycline 12.9 sulfa 14.2 sunburned rash clothing sunlight allergic compound drugs containing 6.6 take safely wear . is no 4.7 exposed ... Fulltext: No doxycycline is not a sulfa containing compound, so you may take it safely if you are allergic to sulfa drugs. You should be aware, however, that doxycycline may cause photosensitivity, so you should wear appropriate clothing, or you may get easily sunburned or develop a rash if you are exposed to sunlight.

show abstract

“…It can be used as a command line tool and as a Python package that can be integrated with other tools. DiffIR is model-agnostic; in its most basic setting, it simply accepts TREC-formatted run files and an ir_datasets [32] dataset identifier to generate an HTML output. Metrics are calculated using pytrec_eval [39] via the ir_measures 3 package.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…DiffIR can be run locally using the command: In the above command, run_1 and run_2 are files that contain the document rankings for each query and uses the standard TREC run format. The user must specify a dataset name supported by ir_datasets [32]. In the sample command above, DiffIR would select the top ten queries whose mean average precision varies the most between the two run files and renders the content as HTML.…”

Section: Demonstrationmentioning

confidence: 99%

DiffIR: Exploring Differences in Ranking Models' Behavior

Jose

Nguyen

MacAvaney

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Understanding and comparing the behavior of retrieval models is a fundamental challenge that requires going beyond examining average effectiveness and per-query metrics, because these do not reveal key differences in how ranking models' behavior impacts individual results. DiffIR is a new open-source web tool to assist with qualitative ranking analysis by visually 'diffing' system rankings at the individual result level for queries where behavior significantly diverges. Using one of several configurable similarity measures, it identifies queries for which the rankings of models compared have important differences in individual rankings and provides a visual web interface to compare the rankings side-by-side. DiffIR additionally supports a model-specific visualization approach based on custom term importance weight files. These support studying the behavior of interpretable models, such as neural retrieval methods that produce document scores based on a similarity matrix or based on a single document passage. Observations from this tool can complement neural probing approaches like ABNIRML to generate quantitative tests. We provide an illustrative use case of DiffIR by studying the qualitative differences between recently developed neural ranking models on a standard TREC benchmark dataset. CCS CONCEPTS• Information systems → Retrieval effectiveness.

show abstract

Simplified Data Wrangling with ir_datasets

Cited by 61 publications

References 42 publications

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

DiffIR: Exploring Differences in Ranking Models' Behavior

Contact Info

Product

Resources

About