Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Levy, Sharon; Mo, Kevin; Xiong, Wenhan; Wang, William Yang

doi:10.48550/arxiv.2110.06962

Cited by 1 publication

(1 citation statement)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• We propose a simple baseline method as the open-domain DocVQA baseline, and the gap between the baseline and human performance shows huge room for improvement. (Kwiatkowski et al, 2019); SQuADopen (Chen et al, 2017b); SearchQA (Dunn et al, 2017); MS-MARCO (Nguyen et al, 2016)), enterprise search (Castelli et al, 2020) to biomedical QA about COVID (Levy et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

DuReadervis: A : A Chinese Dataset for Open-domain Document Visual Question Answering

Qi¹,

Lv²,

Li³

et al. 2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e.g., web pages, PDFs, or Word documents) as the information source. However, designing different text extraction approaches is time-consuming and not scalable. In order to reduce human cost and improve the scalability of QA systems, we propose and study an Open-domain Document Visual Question Answering (Open-domain DocVQA) task, which requires answering questions based on a collection of document images directly instead of only document texts, utilizing layouts and visual features additionally. To advance this task, we introduce the first Chinese Open-domain DocVQA dataset called DuReader vis , containing about 15K question-answering pairs and 158K document images from the Baidu search engine. There are three main challenges in DuReader vis :(1) long document understanding, (2) noisy texts, and (3) multi-span answer extraction. The extensive experiments demonstrate that the dataset is challenging. Additionally, we propose a simple approach that incorporates the layout and visual features, and the experimental results show the effectiveness of the proposed approach. The dataset and code will be publicly available at https://github.com/baidu/DuReader/tree/master/ DuReader-vis. * The work was done when Le Qi was doing internship at Baidu.

show abstract