Race, Sex and Age Disparities in the Performance of ECG Deep Learning Models Predicting Heart Failure

Kaur, Dhamanpreet; Hughes, J. Weston; Rogers, Albert J.; Kang, Guson; Narayan, Sanjiv M.; Ashley, Euan A.; Pérez, Marco

doi:10.1101/2023.05.19.23290257

Cited by 2 publications

(1 citation statement)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While highly performant, deep learning has several key limitations. Domain shifts, which are difficult to track in complex distributions such as waveforms and can occur such as when a model is applied at a new hospital 15 , population 16,17 , or imaging vendor 18 , can degrade model performance significantly. Spurious correlations can allow the model to "cheat" without learning clinically salient features, for example by detecting the presence of a pacemaker or laterality marker in a chest x-ray 19,20 or a surgical skin marking in a dermatology image 21 , leading to unintended shifts in performance during deployment.…”

Section: Introductionmentioning

confidence: 99%

Simple Models Versus Deep Learning in Detecting Low Ejection Fraction From The Electrocardiogram

Hughes,

Somani,

Elias

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

ImportanceDeep learning methods have recently gained success in detecting left ventricular systolic dysfunction (LVSD) from electrocardiogram waveforms. Despite their impressive accuracy, they are difficult to interpret and deploy broadly in the clinical setting.ObjectiveTo determine whether simpler models based on standard electrocardiogram measurements could detect LVSD with similar accuracy to deep learning models.DesignUsing an observational dataset of 40,994 matched 12-lead electrocardiograms (ECGs) and transthoracic echocardiograms, we trained a range of models with increasing complexity to detect LVSD based on ECG waveforms and derived measurements. We additionally evaluated models in two independent cohorts from different medical centers, vendors, and countries.SettingThe training data was acquired from Stanford University Medical Center. External validation data was acquired from Cedars-Sinai Medical Center and the UK Biobank.ExposuresThe performance of models based on ECG waveforms in their detection of LVSD, as defined by ejection fraction below 35%.Main outcomesThe performance of the models as measured by area under the receiver operator characteristic curve (AUC) and other measures of classification accuracy.ResultsThe Stanford dataset consisted of 40,994 matched ECGs and echocardiograms, the test set having an average age of 62.13 (17.61) and 55.20% Male patients, of which 9.72% had LVSD. We found that a random forest model using 555 discrete, automated measurements achieves an area under the receiver operator characteristic curve (AUC) of 0.92 (0.91-0.93), similar to a deep learning waveform model with an AUC of 0.94 (0.93-0.94). Furthermore, a linear model based on 5 measurements achieves high performance (AUC of 0.86 (0.85-0.87)), close to a deep learning model and better than NT-proBNP (0.77 (0.74-0.79)). Finally, we find that simpler models generalize better to other sites, with experiments at two independent, external sites.ConclusionOur study demonstrates the value of simple electrocardiographic models which perform nearly as well as deep learning models while being much easier to implement and interpret.

show abstract