BackgroundThere has been a rapid increase in the number of Artificial Intelligence (AI) studies of cardiac MRI (CMR) segmentation aiming to automate image analysis. However, advancement and clinical translation in this field depend on researchers presenting their work in a transparent and reproducible manner. This systematic review aimed to evaluate the quality of reporting in AI studies involving CMR segmentation.MethodsMEDLINE and EMBASE were searched for AI CMR segmentation studies in April 2022. Any fully automated AI method for segmentation of cardiac chambers, myocardium or scar on CMR was considered for inclusion. For each study, compliance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) was assessed. The CLAIM criteria were grouped into study, dataset, model and performance description domains.Results209 studies published between 2012 and 2022 were included in the analysis. Studies were mainly published in technical journals (58%), with the majority (57%) published since 2019. Studies were from 37 different countries, with most from China (26%), the United States (18%) and the United Kingdom (11%). Short axis CMR images were most frequently used (70%), with the left ventricle the most commonly segmented cardiac structure (49%). Median compliance of studies with CLAIM was 67% (IQR 59–73%). Median compliance was highest for the model description domain (100%, IQR 80–100%) and lower for the study (71%, IQR 63–86%), dataset (63%, IQR 50–67%) and performance (60%, IQR 50–70%) description domains.ConclusionThis systematic review highlights important gaps in the literature of CMR studies using AI. We identified key items missing—most strikingly poor description of patients included in the training and validation of AI models and inadequate model failure analysis—that limit the transparency, reproducibility and hence validity of published AI studies. This review may support closer adherence to established frameworks for reporting standards and presents recommendations for improving the quality of reporting in this field.Systematic Review Registration[www.crd.york.ac.uk/prospero/], identifier [CRD42022279214].
Background Right atrial (RA) area predicts mortality in patients with pulmonary hypertension, and is recommended by the European Society of Cardiology/European Respiratory Society pulmonary hypertension guidelines. The advent of deep learning may allow more reliable measurement of RA areas to improve clinical assessments. The aim of this study was to automate cardiovascular magnetic resonance (CMR) RA area measurements and evaluate the clinical utility by assessing repeatability, correlation with invasive haemodynamics and prognostic value. Methods A deep learning RA area CMR contouring model was trained in a multicentre cohort of 365 patients with pulmonary hypertension, left ventricular pathology and healthy subjects. Inter-study repeatability (intraclass correlation coefficient (ICC)) and agreement of contours (DICE similarity coefficient (DSC)) were assessed in a prospective cohort (n = 36). Clinical testing and mortality prediction was performed in n = 400 patients that were not used in the training nor prospective cohort, and the correlation of automatic and manual RA measurements with invasive haemodynamics assessed in n = 212/400. Radiologist quality control (QC) was performed in the ASPIRE registry, n = 3795 patients. The primary QC observer evaluated all the segmentations and recorded them as satisfactory, suboptimal or failure. A second QC observer analysed a random subcohort to assess QC agreement (n = 1018). Results All deep learning RA measurements showed higher interstudy repeatability (ICC 0.91 to 0.95) compared to manual RA measurements (1st observer ICC 0.82 to 0.88, 2nd observer ICC 0.88 to 0.91). DSC showed high agreement comparing automatic artificial intelligence and manual CMR readers. Maximal RA area mean and standard deviation (SD) DSC metric for observer 1 vs observer 2, automatic measurements vs observer 1 and automatic measurements vs observer 2 is 92.4 ± 3.5 cm2, 91.2 ± 4.5 cm2 and 93.2 ± 3.2 cm2, respectively. Minimal RA area mean and SD DSC metric for observer 1 vs observer 2, automatic measurements vs observer 1 and automatic measurements vs observer 2 was 89.8 ± 3.9 cm2, 87.0 ± 5.8 cm2 and 91.8 ± 4.8 cm2. Automatic RA area measurements all showed moderate correlation with invasive parameters (r = 0.45 to 0.66), manual (r = 0.36 to 0.57). Maximal RA area could accurately predict elevated mean RA pressure low and high-risk thresholds (area under the receiver operating characteristic curve artificial intelligence = 0.82/0.87 vs manual = 0.78/0.83), and predicted mortality similar to manual measurements, both p < 0.01. In the QC evaluation, artificial intelligence segmentations were suboptimal at 108/3795 and a low failure rate of 16/3795. In a subcohort (n = 1018), agreement by two QC observers was excellent, kappa 0.84. Conclusion Automatic artificial intelligence CMR derived RA size and function are accurate, have excellent repeatability, moderate associations with invasive haemodynamics and predict mortality.
Recent years have seen a dramatic increase in studies presenting artificial intelligence (AI) tools for cardiac imaging. Amongst these are AI tools that undertake segmentation of structures on cardiac MRI (CMR), an essential step in obtaining clinically relevant functional information. The quality of reporting of these studies carries significant implications for advancement of the field and the translation of AI tools to clinical practice. We recently undertook a systematic review to evaluate the quality of reporting of studies presenting automated approaches to segmentation in cardiac MRI (Alabed et al. 2022 Quality of reporting in AI cardiac MRI segmentation studies—a systematic review and recommendations for future studies. Frontiers in Cardiovascular Medicine 9:956811). 209 studies were assessed for compliance with the Checklist for AI in Medical Imaging (CLAIM), a framework for reporting. We found variable—and sometimes poor—quality of reporting and identified significant and frequently missing information in publications. Compliance with CLAIM was high for descriptions of models (100%, IQR 80%–100%), but lower than expected for descriptions of study design (71%, IQR 63–86%), datasets used in training and testing (63%, IQR 50%–67%) and model performance (60%, IQR 50%–70%). Here, we present a summary of our key findings, aimed at general readers who may not be experts in AI, and use them as a framework to discuss the factors determining quality of reporting, making recommendations for improving the reporting of research in this field. We aim to assist researchers in presenting their work and readers in their appraisal of evidence. Finally, we emphasise the need for close scrutiny of studies presenting AI tools, even in the face of the excitement surrounding AI in cardiac imaging.
IntroductionCardiac MRI (CMR) is the gold standard technique to assess bi-ventricular volumes and function and is increasingly being considered as an endpoint in clinical studies. Currently, with the exception of right ventricle (RV) stroke volume and RV end-diastolic volume, there is only limited data on minimally important differences (MIDs) reported for CMR metrics. Our study aimed to identify MIDs for CMR metrics based on FDA recommendations for a clinical outcome measure that should reflect how a patient feels, functions or survives.MethodsConsecutive treatment-naïve patients with PAH between 2010 and 2022 who had two CMR scans (at baseline prior to treatment and 12 months following treatment) were identified from the ASPIRE registry. All patients were followed up for one additional year after the second scan. For both scans, cardiac measurements were obtained from a validated fully automated segmentation tool. The MID in CMR metrics was determined using two distribution-based (0.5 standard deviation and minimal detectable change) and two anchor-based methods (change difference and generalised linear model regression) benchmarked to how a patient “feels” (emPHasis-10 questionnaire), “functions” (incremental shuttle walking test) or “survives” for one-year mortality to changes in CMR measurements.Results254 patients with PAH were included (aged 53±16 years, 79% female, and 66% categorised as intermediate risk based on 2022 ESC/ERS risk score,). We identified a 5% absolute increase in RV ejection fraction and a 17 mL decrease in RV end-diastolic or end-systolic volumes as the MIDs for improvement. Conversely, a 5% decrease in RV ejection fraction and a 10 mLincrease in RV volumes were associated with worsening.ConclusionThis study establishes clinically relevant CMR MIDs for how a patient feels, functions or survives in response to PAH treatment. These findings provide further support for the use of CMR as a clinically relevant clinical outcome measure and will aid trial-size calculations for studies using CMR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.