2020
DOI: 10.1007/978-3-030-45442-5_21
|View full text |Cite
|
Sign up to set email alerts
|

ANTIQUE: A Non-factoid Question Answering Benchmark

Abstract: Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
51
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 48 publications
(51 citation statements)
references
References 14 publications
0
51
0
Order By: Relevance
“…The predominant method for text matching tasks such as non-factoid answer selection and question similarity is to train a neural architecture on a large quantity of labeled in-domain data. This includes CNN and LSTM models with attention Wang et al, 2016;Rücklé and Gurevych, 2017), compare-aggregate approaches (Wang and Jiang, 2017;Rücklé et al, 2019a), and, more recently, transformer-based models (Hashemi et al, 2020;Mass et al, 2019). Fine-tuning of large pre-trained transformers such as BERT (Devlin et al, 2019) and RoBERTa (Liu et al, 2019) currently achieves stateof-the-art performances on many related benchmarks (Garg et al, 2020;Mass et al, 2019;Rochette et al, 2019;Nogueira and Cho, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…The predominant method for text matching tasks such as non-factoid answer selection and question similarity is to train a neural architecture on a large quantity of labeled in-domain data. This includes CNN and LSTM models with attention Wang et al, 2016;Rücklé and Gurevych, 2017), compare-aggregate approaches (Wang and Jiang, 2017;Rücklé et al, 2019a), and, more recently, transformer-based models (Hashemi et al, 2020;Mass et al, 2019). Fine-tuning of large pre-trained transformers such as BERT (Devlin et al, 2019) and RoBERTa (Liu et al, 2019) currently achieves stateof-the-art performances on many related benchmarks (Garg et al, 2020;Mass et al, 2019;Rochette et al, 2019;Nogueira and Cho, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…However, a considerable bottleneck in their development and evaluation is the lack of datasets covering both sub-tasks equally well. Current datasets either focus on retrieval results with dense judgements across ranked lists [7,11], or question answer selections in single candidate texts that can retroactively be converted to retrieval collections, but lead to incomplete retrieval judgements [27].…”
mentioning
confidence: 99%
“…We employ four datasets and three retrieval tasks: MSDialog (Qu et al, 2018) and MANtIS (Penha et al, 2019) for conversation response ranking, Quora (Iyer et al, 2017) for similar question retrieval and ANTIQUE (Hashemi et al, 2019) for non-factoid question answering. We use the official train, validation and test sets provided by the datasets' creators.…”
Section: Methodsmentioning
confidence: 99%