DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Bardhan, Jayetri; Colas, Anthony; Roberts, Kirk; Wang, Daisy Zhe

doi:10.48550/arxiv.2205.01290

Cited by 3 publications

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, there is a need for evaluating the quality of the user search experience when searching for information. And this is especially necessary because of the emergence and popularity of new search techniques such as Elastic search [16] [26], Question Answering over free text [3] [6] [4] [2], and Question Answering over knowledge graphs [8] [9] [27] [30]. The search techniques are one-field one-shot search i.e users retrieve information by building a question/query through only a text field and receive the answer in response.…”

Section: Related Workmentioning

confidence: 99%

Evaluation of Search Methods on Community Documents

Bisen

Alemayehu

Maret

et al. 2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

Section: Related Workmentioning

confidence: 99%

Evaluation of Search Methods on Community Documents

Bisen

Alemayehu

Maret

et al. 2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…The records cover a wide range of clinical knowledge, from individual-level information to group-level insight, in various forms, including tables, text, and images [16,17,11,15]. As a vast and comprehensive knowledge base, hospital staff, including physicians, nurses, and administrators, constantly interact with EHRs to store and retrieve patient information to make better clinical decisions [32,3].…”

Section: Introductionmentioning

confidence: 99%

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Lee¹,

Hwang²,

Bae³

et al. 2023

Preprint

View full text Add to dashboard Cite

We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff, including physicians, nurses, insurance review and health records teams, and more. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and templatized the responses to create seed questions. Then, we manually linked them to two open-source EHR databases-MIMIC-III and eICU-and included them with various time expressions and held-out unanswerable questions in the dataset, which were all collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable based on the prediction confidence. We believe our dataset, EHRSQL, could serve as a practical benchmark to develop and assess QA models on structured EHR data and take one step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare.36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.

show abstract

Question Answering for Electronic Health Records: Scoping Review of Datasets and Models

Bardhan,

Roberts,

Wang

2024

J Med Internet Res

View full text Add to dashboard Cite

Background: Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the patient's medical record. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering.Objective: This study aimed to provide a methodological review of existing works on QA over EHRs. The objectives of this study were to (i) identify the existing EHR QA datasets and analyze them, (ii) study the state-of-the-art methodologies used in this task, (iii) compare the different evaluation metrics used by these state-of-the-art models, and finally (iv) elicit the various challenges and the ongoing issues in EHR QA. Methods:We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. Our systematic screening process followed PRISMA guidelines. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. The selected studies were then classified into two nonmutually exclusive categories depending on their scope: 'EHR QA datasets' and 'EHR QA Models'.Results: A systematic screening process obtained a total of 47 papers on EHR QA for final review. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. We have classified the EHR QA datasets based on their modality, and we have inferred that MIMIC-III and the n2c2 datasets are the most popular EHR database/corpus used in EHR QA. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.Conclusions: EHR QA research faces multiple challenges such as the limited availability of clinical annotations, concept normalization in EHR QA, as well as challenges faced in generating realistic EHR QA datasets. There are still many gaps in research that motivate further work. This study will assist future researchers in focusing on areas of EHR QA that have possible future research directions.

show abstract

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Cited by 3 publications

References 19 publications

Evaluation of Search Methods on Community Documents

Evaluation of Search Methods on Community Documents

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Question Answering for Electronic Health Records: Scoping Review of Datasets and Models

Contact Info

Product

Resources

About