Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3463085
|View full text |Cite
|
Sign up to set email alerts
|

Synthetic Target Domain Supervision for Open Retrieval QA

Abstract: Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR)-a state-of-the-art (SOTA) open domain neural retrieval model-on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…We call this the unconditioned question generator, since the questions are not conditioned to be about any specific entities. This serves as a baseline question generation approach and is comparable with prior work [5,14,17] in synthetic data generation for IR, which do not enforce such specific conditioning into the question generation process.…”
Section: Baselinesmentioning
confidence: 82%
See 2 more Smart Citations
“…We call this the unconditioned question generator, since the questions are not conditioned to be about any specific entities. This serves as a baseline question generation approach and is comparable with prior work [5,14,17] in synthetic data generation for IR, which do not enforce such specific conditioning into the question generation process.…”
Section: Baselinesmentioning
confidence: 82%
“…To compare our approach with a generation strategy that does not use any conditioning, we also train an unconditioned generation system, similar to Reddy et al [17], that generates question-answer pairs using just the passage as input. We call this the unconditioned question generator, since the questions are not conditioned to be about any specific entities.…”
Section: Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, to prompt-based generation of training data, there are multiple proposals for self-supervised adaptation of out-of-domain models using generative pseudo-labeling [22,38,51]. To this end, questions or queries are generated using a pretrained seq2seq model (though an LLMs can be used as well) and negative examples are mined using either BM25 or an out-of-domain retriever or ranker.…”
Section: Related Workmentioning
confidence: 99%