Existing methods of screening for substance abuse (standardized questionnaires or clinician's simply asking) have proven difficult to initiate and maintain in primary care settings. This article reports on how predictive modeling can be used to screen for substance abuse using extant data in electronic health records (EHRs). We relied on data available through Veterans Affairs Informatics and Computing Infrastructure (VINCI) for the years 2006 through 2016. We focused on 4,681,809 veterans who had at least two primary care visits; 829,827 of whom had a hospitalization. Data included 699 million outpatient and 17 million inpatient records. The dependent variable was substance abuse as identified from 89 diagnostic codes using the Agency for Healthcare Quality and Research classification of diseases. In addition, we included the diagnostic codes used for identification of prescription abuse. The independent variables were 10,292 inpatient and 13,512 outpatient diagnoses, plus 71 dummy variables measuring age at different years between 20 and 90 years. A modified naive Bayes model was used to aggregate the risk across predictors. The accuracy of the predictions was examined using area under the receiver operating characteristic (AROC) curve in 20% of data, randomly set aside for the evaluation. Many physical/mental illnesses were associated with substance abuse. These associations supported findings reported in the literature regarding the impact of substance abuse on various diseases and vice versa. In randomly set-aside validation data, the model accurately predicted substance abuse for inpatient (AROC = 0.884), outpatient (AROC = 0.825), and combined inpatient and outpatient (AROC = 0.840) data. If one excludes information available after substance abuse is known, the cross-validated AROC remained high, 0.822 for inpatient and 0.817 for outpatient data. Data within EHRs can be used to detect existing or predict potential future substance abuse.