Background: Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the patient's medical record. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering.Objective: This study aimed to provide a methodological review of existing works on QA over EHRs. The objectives of this study were to (i) identify the existing EHR QA datasets and analyze them, (ii) study the state-of-the-art methodologies used in this task, (iii) compare the different evaluation metrics used by these state-of-the-art models, and finally (iv) elicit the various challenges and the ongoing issues in EHR QA.
Methods:We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. Our systematic screening process followed PRISMA guidelines. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. The selected studies were then classified into two nonmutually exclusive categories depending on their scope: 'EHR QA datasets' and 'EHR QA Models'.Results: A systematic screening process obtained a total of 47 papers on EHR QA for final review. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. We have classified the EHR QA datasets based on their modality, and we have inferred that MIMIC-III and the n2c2 datasets are the most popular EHR database/corpus used in EHR QA. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.Conclusions: EHR QA research faces multiple challenges such as the limited availability of clinical annotations, concept normalization in EHR QA, as well as challenges faced in generating realistic EHR QA datasets. There are still many gaps in research that motivate further work. This study will assist future researchers in focusing on areas of EHR QA that have possible future research directions.