“…(5) MASH-QA: Multiple Answer Spans Healthcare Question Answering MASH-QA is a large-scale dataset for QA, with many answers coming from multiple spans within a long document. The dataset consists of over 35,000 QA pairs and is based on questions and knowledge articles from the consumer health domain, where the questions are generally non-factoid in nature and cannot be answered using just a few words [78]. The experimental results in [78] show that using models of DrQA Reader, BiDAF, BERT, SpanBERT, XLnet and MultiCo, on the MASH-QA dataset, the F1 are 18.92%, 23.19%, 27.93%, 30.61%, 56.46%, and 64.94% and EM are 1.82%, 2.42%, 3.95%, 5.62%, 22.78%, and 29.49% respectively.…”