Selective Weak Supervision for Neural Information Retrieval

Zhang, Kaitao; Xiong, Chenyan; Liu, Zhenghao; Liu, Zhiyuan

doi:10.1145/3366423.3380131

Cited by 39 publications

(44 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rücklé et al (2019b) use weakly supervised training, self-supervised training methods, and question generation. Similar approaches were also explored in ad-hoc retrieval (Zhang et al, 2020;Ma et al, 2020;MacAvaney et al, 2019). A crucial limitation of these approaches is that they result in entirely separate models for each dataset and are thus not re-usable.…”

Section: Related Workmentioning

confidence: 99%

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

Rücklé

Pfeiffer

Gurevych

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines. We also demonstrate that considering a broad selection of source domains is crucial for obtaining the best zero-shot transfer performances, which contrasts the standard procedure that merely relies on the largest and most similar domains. In addition, we extensively study how to best combine multiple source domains. We propose to incorporate self-supervised with supervised multi-task learning on all available source domains. Our best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks. Fine-tuning of our model with in-domain data results in additional large gains and achieves the new state of the art on all nine benchmarks.

show abstract

Section: Related Workmentioning

confidence: 99%

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

Rücklé

Pfeiffer

Gurevych

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…In our experiments with TREC Deep Learning Track and three more fewshot document ranking benchmarks (Zhang et al, 2020), QDS-Transformer consistently improves the standard retrofitting BERT ranking baselines (e.g., max-pooling on paragraphs) by 5% NDCG. It also shows gains over more recent transformer architectures that induces various sparse structures, including Sparse Transformer, Longformer, and Transformer-XH, as they were not designed to incorporate the essential information required in document ranking.…”

Section: Introductionmentioning

confidence: 80%

“…Few-shot Document Ranking. All experimental settings for few-shot learning are consistent with the"MS MARCO Human Labels" setting in previous studies (Zhang et al, 2020). Each method first trains a neural ranker on MARCO training labels, which are identical as in the TREC DL track.…”

Section: Discussionmentioning

confidence: 99%

Long Document Ranking with Query-Directed Sparse Transformer

Jiang¹,

Xiong²,

Lee³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

The computing cost of transformer selfattention often necessitates breaking long documents to fit in pretrained models in document ranking tasks. In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking: local contextualization, hierarchical representation, and query-oriented proximity matching, while it also enjoys efficiency from sparsity. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer over previous approaches, as they either retrofit long documents into BERT or use sparse attention without emphasizing IR principles. We further quantify the computing complexity and demonstrates that our sparse attention with TVM implementation is twice more efficient that the fully-connected selfattention. All source codes, trained model,

show abstract

“…[8] generate pseudo-qrels from a news collection, using the titles as pseudo-queries and their content as relevant text. Other authors [2,18] use the signal produced by anchor-document relationships to simulate qrels.…”

Section: Related Workmentioning

confidence: 99%

Fine-Tuning BERT for COVID-19 Domain Ad-Hoc IR by Using Pseudo-qrels

Saralegi¹,

Vicente²

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This work analyzes the feasibility of training a neural retrieval system for a collection of scientific papers about COVID-19 using pseudo-qrels extracted from the collection. We propose a method for generating pseudo-qrels that exploits two characteristics present in scientific articles: a) the relationship between title and abstract, and b) the relationship between articles through sentences containing citations. Through these signals we generate pseudo-queries and their respective pseudo-positive (relevant documents) and pseudo-negative (non-relevant documents) examples. The article retrieval process combines a ranking model based on term-maching techniques and a neural one based on pretrained BERT models. BERT models are fine-tuned to the task using the pseudo-qrels generated. We compare different BERT models, both open domain and biomedical domain, and also the generated pseudo-qrels with the open domain MS-Marco dataset for fine-tuning the models. The results obtained on the TREC-COVID collection show that pseudo-qrels provide a significant improvement to neural models, both against classic IR baselines based on term-matching and neural systems trained on MS-Marco.

show abstract

Selective Weak Supervision for Neural Information Retrieval

Cited by 39 publications

References 39 publications

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

Long Document Ranking with Query-Directed Sparse Transformer

Fine-Tuning BERT for COVID-19 Domain Ad-Hoc IR by Using Pseudo-qrels

Contact Info

Product

Resources

About