ObjectiveCongenital heart defects (CHD) are still missed despite nearly universal prenatal ultrasound screening programs, which may result in severe morbidity or even death. Deep machine learning (DL) can automate image recognition from ultrasound. The main aim of this study was to apply a previously developed DL model trained on images from a tertiary center, to fetal ultrasound images obtained during the second‐trimester standard anomaly scan in a low‐risk population. A secondary aim was to compare initial screening diagnosis, which makes use of live imaging at the point of care, with clinicians evaluating only stored images.MethodsAll pregnancies with isolated severe CHD in the Northwestern region of the Netherlands between 2015 and 2016 with available stored images were evaluated, as well as a sample of normal fetuses’ examinations from the same region. We compared initial clinical diagnostic accuracy (made in real time with access to live imaging), model accuracy, and performance of blinded human experts with access only to the stored images (like the model). We analyzed performance by study characteristics such as duration, quality (independently scored by study investigators), number of stored images, and availability of screening views.ResultsA total of 42 normal fetuses and 66 cases of isolated CHD at birth were analyzed. Of the abnormal cases, 31 were missed and 35 were detected at the time of the clinical anatomy scan (sensitivity 53 percent). Model sensitivity and specificity was 91 and 78 percent, respectively. Blinded human experts (n=3) achieved sensitivity and specificity of 55±10 percent (range 47‐67 percent) and 71±13 percent (range 57‐83 percent), respectively. There was a statistically significant difference in model correctness by expert‐graded image quality (p=0.03). Abnormal cases included 19 lesions the model had not encountered in its training; the model's performance (16/19 correct) was not statistically significantly different on previously encountered vs. never before seen lesions (p=0.41).ConclusionsA previously trained DL algorithm had higher sensitivity than initial clinical assessment in detecting CHD in a cohort in which over 50 percent of CHD cases were initially missed clinically. Notably, the DL algorithm performed well on community‐acquired images in a low‐risk population, including lesions it had not been previously exposed to. Furthermore, when both the model and blinded human experts had access to stored images alone, and not the full range of images available to a clinician during a live scan, the model outperformed expert humans. Together, these findings support the proposition that use of DL models can improve prenatal detection of CHD.This article is protected by copyright. All rights reserved.