BACKGROUND
Artificial Intelligence (AI) has the potential to revolutionize healthcare by enhancing both clinical outcomes and operational efficiency. However, its clinical adoption has been slower than anticipated, largely due to the absence of comprehensive evaluation frameworks. Existing frameworks remain insufficient and tend to emphasize technical metrics like accuracy and validation, while overlooking critical real-world factors such as clinical impact, integration, and economic sustainability. This narrow focus prevents AI tools from being effectively implemented, limiting their broader impact and long-term viability in clinical practice.
OBJECTIVE
This study aimed to create a comprehensive framework for assessing AI in healthcare, extending beyond technical metrics to incorporate social and organizational dimensions. The framework was developed by systematically reviewing, analyzing, and synthesizing the evaluation criteria necessary for successful implementation, focusing on the long-term real-world impact of AI in clinical practice.
METHODS
A comprehensive search was performed in July 2024 across PubMed, Cochrane, Scopus, and IEEE Xplore databases to identify relevant studies published in English between January 2019 and mid-July 2024, yielding 3528 results, of which 44 studies met the inclusion criteria. The systematic review followed PRISMA guidelines and the Cochrane Handbook for Systematic Reviews to ensure a systematic approach. Data were analyzed using NVivo (QSR International) through thematic analysis and narrative synthesis to identify key emergent themes in the studies.
RESULTS
By synthesizing the included studies, we developed a framework that goes beyond the traditional focus on technical metrics or study-level methodologies. It integrates clinical context and real-world implementation factors, offering a more comprehensive approach to evaluating AI tools. With our focus on assessing the long-term real-world impact of AI technologies in healthcare, we named the framework AI for IMPACTS. The criteria are organized into seven key clusters, each corresponding to a letter in the acronym: (I) integration, interoperability and workflow (M) monitoring, governance, and accountability (P) performance and quality metrics (A) acceptability, trust, and training (C) cost and economic evaluation (T) technological safety and transparency (S) scalability and impact. These are further broken down into 32 specific sub-criteria.
CONCLUSIONS
The AI for IMPACTS framework offers a holistic approach to evaluating the long-term real-world impact of AI tools in the heterogeneous and challenging healthcare context, but further validation through expert consensus and testing of the framework in real-world healthcare settings would strengthen the findings. It is important to emphasize that multidisciplinary expertise is essential for thorough assessment, yet many assessors lack the necessary training. Additionally, traditional evaluation methods struggle to keep pace with AI's rapid development. To ensure successful AI integration, flexible, fast-tracked assessment processes and proper assessor training are needed that maintain rigorous standards while adapting to AI’s dynamic evolution.
CLINICALTRIAL
NA