Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.92
|View full text |Cite
|
Sign up to set email alerts
|

Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

Abstract: A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, ye… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 67 publications
(52 citation statements)
references
References 49 publications
0
52
0
Order By: Relevance
“…In this work, we adopt the approach mentioned in (Ma et al, 2020), and trained our own question generation model based on Text-To-Text Transfer Transformer (T5) (Raffel et al, 2020). We then use this model to generate synthetic questions for the passages in a collection of documents.…”
Section: Hard Negatives In Pre-trainingmentioning
confidence: 99%
See 2 more Smart Citations
“…In this work, we adopt the approach mentioned in (Ma et al, 2020), and trained our own question generation model based on Text-To-Text Transfer Transformer (T5) (Raffel et al, 2020). We then use this model to generate synthetic questions for the passages in a collection of documents.…”
Section: Hard Negatives In Pre-trainingmentioning
confidence: 99%
“…Training an effective neural retrieval model usually requires a large amount of high-quality data. To alleviate the need of high-quality data, training can be approached in two-stages: pre-training on noise data (Guo et al, 2018;Chang et al, 2020;Ma et al, 2020) and fine tuning on a smaller amount of highquality data, also regarded as "gold" data.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…GPL improves the performance by up to 8.9 points nDCG@10 compared to state-of-the-art model trained solely on MS MARCO. Compared to the previous state-of-the-art domain-adaption method QGen (Ma et al, 2021;Thakur et al, 2021b), GPL improves the performance by up to 5.2 nDCG@10 points. Training with GPL is easy, fast, and data efficient.…”
Section: Java Is …mentioning
confidence: 88%
“…This performed well in the zero-shot retrieval benchmark BeIR (Thakur et al, 2021b). Ma et al (2021) propose QGen, that uses a query generator trained on general domain data to synthesize domain-targeted queries for the target corpus, on which a dense retriever is trained from scratch. Following this idea, Thakur et al (2021b) views QGen as a post-training method to adapt powerful MS MARCO retrievers to the target domains.…”
Section: Related Workmentioning
confidence: 97%