Generating Biomedical Question Answering Corpora From Q&amp;A Forums

Lamurias, Andre; Sousa, Diana; Couto, Francisco M.

doi:10.1109/access.2020.3020868

Cited by 9 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PubMedQA ( Jin et al , 2019 ) created a QA dataset that can be used as yes/no or query-focused summarization. Cloze style QA datasets are also proposed in the domain of BioNLP ( Kim et al , 2018 ; Lamurias et al , 2020 ; Pappas et al , 2020 ).…”

Section: Related Workmentioning

confidence: 99%

Sequence tagging for biomedical extractive question answering

et al. 2022

View full text Add to dashboard Cite

Motivation Current studies in extractive question answering (EQA) have modeled the single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the general domain can be answered with a single span. Following general domain EQA models, current biomedical EQA (BioEQA) models utilize the single-span extraction setting with post-processing steps. Results In this paper, we investigate the question distribution across the general and biomedical domains and discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer). This necessitates the models capable of producing multiple answers for a question. Based on this preliminary study, we propose a sequence tagging approach for BioEQA, which is a multi-span extraction setting. Our approach directly tackles questions with a variable number of phrases as their answer and can learn to decide the number of answers for a question from training data. Our experimental results on the BioASQ 7 b and 8 b list-type questions outperformed the best-performing existing models without requiring post-processing steps. Availability Source codes and resources are freely available for download at https://github.com/dmis-lab/SeqTagQA Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Section: Related Workmentioning

confidence: 99%

Sequence tagging for biomedical extractive question answering

et al. 2022

View full text Add to dashboard Cite

show abstract

“…To test our framework, we made adjustments (see Appendix A) to four biomedical datasets: BioASQ (Lamurias et al, 2020), COVID-QA (Möller et al, 2020), cpgQA (Mahbub et al, 2023) and SleepQA (Bojic et al, 2022). We refer the reader to Table 1 for statistics on the final version of datasets that we used in all experiments: original/final size of text corpus, original/final number of labels and finally, train/dev/test split.…”

Section: Datasetsmentioning

confidence: 99%

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

Bojic,

Halim,

Suharman

et al. 2023

The Fourth Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. Highquality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets 1 . We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33%/40% for fine-tuning of retrieval/reader models on the BioASQ dataset when using back translation to enhance the original dataset quality.

show abstract

“…Pub-MedQA (Jin et al, 2019) created a QA dataset that can be used as yes/no or query-focused summarization. Cloze style QA datasets are also proposed in the domain of BioNLP (Kim et al, 2018;Lamurias et al, 2020;Pappas et al, 2020).…”

Section: Questions For Question Answeringmentioning

confidence: 99%

Sequence Tagging for Biomedical Extractive Question Answering

Yoon,

Jackson,

Kang

et al. 2021

Preprint

View full text Add to dashboard Cite

Current studies in extractive question answering (EQA) have modeled single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the general domain can be answered with a single span. Following general domain EQA models, current biomedical EQA (BioEQA) models utilize single-span extraction setting with post-processing steps. In this paper, we investigate the difference of the question distribution across the general and biomedical domains and discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer). In real-world use cases, this emphasizes the need for Biomedical EQA models able to handle multiple question types. Based on this preliminary study, we propose a multispan extraction setting, namely sequence tagging approach for BioEQA, which directly tackles questions with a variable number of phrases as their answer. Our approach can learn to decide the number of answers for a question from training data. Our experimental result on the BioASQ 7b and 8b list-type questions outperformed the best-performing existing models without requiring post-processing steps.

show abstract

Generating Biomedical Question Answering Corpora From Q&A Forums

Cited by 9 publications

References 23 publications

Sequence tagging for biomedical extractive question answering

Sequence tagging for biomedical extractive question answering

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

Sequence Tagging for Biomedical Extractive Question Answering

Contact Info

Product

Resources

About