BackgroundPatients with sleep apnea (SA) and coronary artery disease (CAD) are at higher risk of atrial fibrillation (AF) than the general population. Our objectives were: to evaluate the role of CAD and SA in determining AF risk through cluster and survival analysis, and to develop a risk model for predicting AF.MethodsElectronic medical record (EMR) database from 22,302 individuals including 10,202 individuals with AF, CAD, and SA, and 12,100 individuals without these diseases were analyzed using K-means clustering technique; k-nearest neighbor (kNN) algorithm and survival analysis. Age, sex, and diseases developed for each individual during 9 years were used for cluster and survival analysis.ResultsThe risk models for AF, CAD, and SA were identified with high accuracy and sensitivity (0.98). Cluster analysis showed that CAD and high blood pressure (HBP) are the most prevalent diseases in the AF group, HBP is the most prevalent disease in CAD; and HBP and CAD are the most prevalent diseases in the SA group. Survival analysis demonstrated that individuals with HBP, CAD, and SA had a 1.5-fold increased risk of developing AF [hazard ratio (HR): 1.49, 95% CI: 1.18–1.87, p = 0.0041; HR: 1.46, 95% CI: 1.09–1.96, p = 0.01; HR: 1.54, 95% CI: 1.22–1.94, p = 0.0039, respectively] and individuals with chronic kidney disease (CKD) developed AF approximately 50% earlier than patients without these comorbidities in a period of 7 years (HR: 3.36, 95% CI: 1.46–7.73, p = 0.0023). Comorbidities that contributed to develop AF earlier in females compared to males in the group of 50–64 years were HBP (HR: 3.75 95% CI: 1.08–13, p = 0.04) CAD and SA in the group of 60–75 years were (HR: 2.4 95% CI: 1.18–4.86, p = 0.02; HR: 2.51, 95% CI: 1.14–5.52, p = 0.02, respectively).ConclusionMachine learning based algorithms demonstrated that CAD, SA, HBP, and CKD are significant risk factors for developing AF in a Latin–American population.