Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.342
|View full text |Cite
|
Sign up to set email alerts
|

Question Answering with Long Multiple-Span Answers

Abstract: Answering questions in many real-world applications often requires complex and precise information excerpted from texts spanned across a long document. However, currently no such annotated dataset is publicly available, which hinders the development of neural questionanswering (QA) systems. To this end, we present MASH-QA 1 , a Multiple Answer Spans Healthcare Question Answering dataset from the consumer health domain, where answers may need to be excerpted from multiple, nonconsecutive parts of text spanned a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(32 citation statements)
references
References 22 publications
0
32
0
Order By: Relevance
“…Transformer-based models are the state-of-the-art for both SQuAD1.1 19 and SQuAD2.0 32 . There are several MQA datasets developed in the past few years such as the MESHQA 73 , MedQuAD 74 , and emrQA 45 . In this study, we used the emrQA dataset, which is widely used as a benchmark dataset for MQA.…”
Section: Methodsmentioning
confidence: 99%
“…Transformer-based models are the state-of-the-art for both SQuAD1.1 19 and SQuAD2.0 32 . There are several MQA datasets developed in the past few years such as the MESHQA 73 , MedQuAD 74 , and emrQA 45 . In this study, we used the emrQA dataset, which is widely used as a benchmark dataset for MQA.…”
Section: Methodsmentioning
confidence: 99%
“…(5) MASH-QA: Multiple Answer Spans Healthcare Question Answering MASH-QA is a large-scale dataset for QA, with many answers coming from multiple spans within a long document. The dataset consists of over 35,000 QA pairs and is based on questions and knowledge articles from the consumer health domain, where the questions are generally non-factoid in nature and cannot be answered using just a few words [78]. The experimental results in [78] show that using models of DrQA Reader, BiDAF, BERT, SpanBERT, XLnet and MultiCo, on the MASH-QA dataset, the F1 are 18.92%, 23.19%, 27.93%, 30.61%, 56.46%, and 64.94% and EM are 1.82%, 2.42%, 3.95%, 5.62%, 22.78%, and 29.49% respectively.…”
Section: Datasetsmentioning
confidence: 99%
“…The dataset consists of over 35,000 QA pairs and is based on questions and knowledge articles from the consumer health domain, where the questions are generally non-factoid in nature and cannot be answered using just a few words [78]. The experimental results in [78] show that using models of DrQA Reader, BiDAF, BERT, SpanBERT, XLnet and MultiCo, on the MASH-QA dataset, the F1 are 18.92%, 23.19%, 27.93%, 30.61%, 56.46%, and 64.94% and EM are 1.82%, 2.42%, 3.95%, 5.62%, 22.78%, and 29.49% respectively. Table 4 gives the details of different QA datasets used in the healthcare domain.…”
Section: Datasetsmentioning
confidence: 99%
“…There are currently four main types of approaches in state-of-the-art literature that utilize the SR and AR systems (1). Chen et al (2017) uses a two-stage training pipeline where the SR model consists of an unsupervised Information Retrieval (IR) method like TF-IDF or BM25, followed by an extractive AR model; (2) an end-to-end learning setup of SR cascaded by AR (Guu et al, 2020;; (3) single-span (Rajpurkar et al, 2016) or multi-span (Zhu et al, 2020;Segal et al, 2020) answers given questions and corresponding candidate contexts as inputs and (4) a Multi-task Learning (MTL) Framework, where SR and AR are the two underlying tasks (Nishida et al, 2018); Nishida et al (2018) performs MTL using separate SR and AR pipelines sharing feature extraction layers. The simultaneous training of SR and AR using MTL helps the model build a combined and hierarchical understanding of Question Answering at a global (section) and a local (sentence/token) level.…”
Section: Introductionmentioning
confidence: 99%