ImportanceEarly detection of atrial fibrillation (AF) may help prevent adverse cardiovascular events such as stroke. Deep learning applied to electrocardiograms (ECGs) has been successfully used for early identification of several cardiovascular diseases.ObjectiveTo determine whether deep learning models applied to outpatient ECGs in sinus rhythm can predict AF in a large and diverse patient population.Design, Setting, and ParticipantsThis prognostic study was performed on ECGs acquired from January 1, 1987, to December 31, 2022, at 6 US Veterans Affairs (VA) hospital networks and 1 large non-VA academic medical center. Participants included all outpatients with 12-lead ECGs in sinus rhythm.Main Outcomes and MeasuresA convolutional neural network using 12-lead ECGs from 2 US VA hospital networks was trained to predict the presence of AF within 31 days of sinus rhythm ECGs. The model was tested on ECGs held out from training at the 2 VA networks as well as 4 additional VA networks and 1 large non-VA academic medical center.ResultsA total of 907 858 ECGs from patients across 6 VA sites were included in the analysis. These patients had a mean (SD) age of 62.4 (13.5) years, 6.4% were female, and 93.6% were male, with a mean (SD) CHA2DS2-VASc (congestive heart failure, hypertension, age, diabetes mellitus, prior stroke or transient ischemic attack or thromboembolism, vascular disease, age, sex category) score of 1.9 (1.6). A total of 0.2% were American Indian or Alaska Native, 2.7% were Asian, 10.7% were Black, 4.6% were Latinx, 0.7% were Native Hawaiian or Other Pacific Islander, 62.4% were White, 0.4% were of other race or ethnicity (which is not broken down into subcategories in the VA data set), and 18.4% were of unknown race or ethnicity. At the non-VA academic medical center (72 483 ECGs), the mean (SD) age was 59.5 (15.4) years and 52.5% were female, with a mean (SD) CHA2DS2-VASc score of 1.6 (1.4). A total of 0.1% were American Indian or Alaska Native, 7.9% were Asian, 9.4% were Black, 2.9% were Latinx, 0.03% were Native Hawaiian or Other Pacific Islander, 74.8% were White, 0.1% were of other race or ethnicity, and 4.7% were of unknown race or ethnicity. A deep learning model predicted the presence of AF within 31 days of a sinus rhythm ECG on held-out test ECGs at VA sites with an area under the receiver operating characteristic curve (AUROC) of 0.86 (95% CI, 0.85-0.86), accuracy of 0.78 (95% CI, 0.77-0.78), and F1 score of 0.30 (95% CI, 0.30-0.31). At the non-VA site, AUROC was 0.93 (95% CI, 0.93-0.94); accuracy, 0.87 (95% CI, 0.86-0.88); and F1 score, 0.46 (95% CI, 0.44-0.48). The model was well calibrated, with a Brier score of 0.02 across all sites. Among individuals deemed high risk by deep learning, the number needed to screen to detect a positive case of AF was 2.47 individuals for a testing sensitivity of 25% and 11.48 for 75%. Model performance was similar in patients who were Black, female, or younger than 65 years or who had CHA2DS2-VASc scores of 2 or greater.Conclusions and RelevanceDeep learning of outpatient sinus rhythm ECGs predicted AF within 31 days in populations with diverse demographics and comorbidities. Similar models could be used in future AF screening efforts to reduce adverse complications associated with this disease.