Self-Supervised Pretraining Enables High-Performance Chest X-Ray Interpretation Across Clinical Distributions

S., Iyer, Niveditha; Gulati, Aditya; Banerjee, Oishi; Logé, Cécile; Farhat, Maha; Saenz, Agustina; Rajpurkar, Pranav

doi:10.1101/2022.11.19.22282519

Cited by 2 publications

(1 citation statement)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the broad range of data used to train these models, the performance of foundation models are often more robust than with conventional convolutional neural networks 14,15 . In biomedical applications, foundation models have been developed to organize biological [16][17][18] and medical 19 datasets, including modality-specific models for chest X-rays, retinal imaging, wearable waveforms and pathology images [20][21][22][23][24][25] . Training of foundation models on medical imaging has been bottlenecked by dataset size and is often limited to publicly available data that may not represent the range of disease severities and possible presentations.…”

Section: Articlementioning

confidence: 99%

Vision–language foundation model for echocardiogram interpretation

Christensen,

Vukadinovic,

Yuan

et al. 2024

Nat Med

View full text Add to dashboard Cite

The development of robust artificial intelligence models for echocardiography has been limited by the availability of annotated clinical data. Here, to address this challenge and improve the performance of cardiac imaging models, we developed EchoCLIP, a vision–language foundation model for echocardiography, that learns the relationship between cardiac ultrasound images and the interpretations of expert cardiologists across a wide range of patients and indications for imaging. After training on 1,032,975 cardiac ultrasound videos and corresponding expert text, EchoCLIP performs well on a diverse range of benchmarks for cardiac image interpretation, despite not having been explicitly trained for individual interpretation tasks. EchoCLIP can assess cardiac function (mean absolute error of 7.1% when predicting left ventricular ejection fraction in an external validation dataset) and identify implanted intracardiac devices (area under the curve (AUC) of 0.84, 0.92 and 0.97 for pacemakers, percutaneous mitral valve repair and artificial aortic valves, respectively). We also developed a long-context variant (EchoCLIP-R) using a custom tokenizer based on common echocardiography concepts. EchoCLIP-R accurately identified unique patients across multiple videos (AUC of 0.86), identified clinical transitions such as heart transplants (AUC of 0.79) and cardiac surgery (AUC 0.77) and enabled robust image-to-text search (mean cross-modal retrieval rank in the top 1% of candidate text reports). These capabilities represent a substantial step toward understanding and applying foundation models in cardiovascular imaging for preliminary interpretation of echocardiographic findings.

show abstract