Background: Artificial intelligence (AI)-based medical devices and digital health technologies such as medical sensors, wearable health trackers, telemedicine, mobile (m) Health, large language models (LLMs), and digital care twins (DCTs) have a substantial influence on the process of clinical decision support systems (CDSS) in healthcare and medicine application. However, given the complexity of medical decisions, it is crucial that results generated by AI tools not only deliver accurate results but are also evaluated carefully, and will be understandable and explainable to end-users, especially clinicians. The absence of interpretability in the realm of communicating AI clinical decisions can result in mistrust of decision-makers and fear of using these technologies.
Objective: This paper presents a systematic review of the processes and challenges related to interpretable machine learning (IML) and explainable artificial intelligence (XAI) within the healthcare and medical domains. The main objectives of this paper are to review the process of IML and XAI, related methods, applications, and their implantation challenges in the context of DHIs, particularly with a quality control perspective for easy to understand and better communicate between AI and clinicians. The IML process is classified into three parts: pre-processing interpretability, interpretable modeling, and post-processing interpretability. The paper intends to establish a comprehensive understanding of the importance of robust interpretability approach in clinical decision support systems (CDSS) by reviewing the related experimental results. The ultimate aim is to provide future researches with insights for creating clinician-AI tools that are more communicable in healthcare decision support systems, as well as to offer a deeper understanding of the challenges they might face.
Methods: Our research questions, eligibility criteria and primary goals were identified using Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guideline and PICO (population, intervention, control, and outcomes) method, PubMed, Scopus and Web of Science databases were systematically searched using sensitive and specific search strings.
In the next steps, duplicate papers were removed by EndNote and Covidence, and then a two-phase selection was conducted on Covidence via title and abstract, followed by full-text appraisal. Meta quality appraisal tool (MetaQAT tool) was used for quality and risk of bias assessment. In the end, a standardized data extraction tool was used for reliable data mining.
Results: The searches retrieved 2241 records; 555 duplicate papers were removed. At the title and abstract screening step, 958 papers were excluded, and the full-text reviewing step excluded 482 studies. Then in the process of quality and risk of bias assessment, 172 papers were removed. In the end, 74 publications were selected for data extraction which included 10 exciting reviews and 64 related experimental studies.
Conclusion: The paper offers general definitions of XAI in the medical domain, proposes a three-levels interpretability process for clinical decision support systems, and discusses XAI-related health applications at each level of the proposed framework, supported by a review of related experimental results. Additionally, it provides a comprehensive discussion of quality assessment tools for evaluating XAI in intelligent health systems. Moreover, this survey introduces a step-by-step roadmap for implementing XAI in clinical applications. To guide future research in addressing existing gaps, the paper delves into the significance of XAI models from various perspectives and acknowledges their limitations.