BACKGROUND: Accurately predicting the risk of no-show for a scheduled colonoscopy can help target interventions to improve compliance with colonoscopy, and thereby reduce the disease burden of colorectal cancer and enhance the utilization of resources within endoscopy units. OBJECTIVES: We aimed to utilize information available in an electronic medical record (EMR) and endoscopy scheduling system to create a predictive model for no-show risk, and to simultaneously evaluate the role for natural language processing (NLP) in developing such a model. DESIGN: This was a retrospective observational study using discovery and validation phases to design a colonoscopy non-adherence prediction model. An NLPderived variable called the Non-Adherence Ratio ("NAR") was developed, validated, and included in the model. PARTICIPANTS: Patients scheduled for outpatient colonoscopy at an Academic Medical Center (AMC) that is part of a multi-hospital health system, 2009 to 2011, were included in the study. MAIN MEASURES: Odds ratios for non-adherence were calculated for all variables in the discovery cohort, and an Area Under the Receiver Operating Curve (AUC) was calculated for the final non-adherence prediction model. KEY RESULTS: The non-adherence model included six variables: 1) gender; 2) history of psychiatric illness, 3) NAR; 4) wait time in months; 5) number of prior missed endoscopies; and 6) education level. The model achieved discrimination in the validation cohort (AUC= =70.2 %). At a threshold non-adherence score of 0.46, the model's sensitivity and specificity were 33 % and 92 %, respectively. Removing the NAR from the model significantly reduced its predictive power (AUC = 64.3 %, difference = 5.9 %, p<0.001). CONCLUSIONS: A six-variable model using readily available clinical and demographic information demonstrated accuracy for predicting colonoscopy non-adherence. The NAR, a novel variable developed using NLP technology, significantly strengthened this model's predictive power.