Aim: To predict ambulatory status and Gross Motor Function Classification System (GMFCS) levels in patients with cerebral palsy (CP) by applying natural language processing (NLP) to electronic health record (EHR) clinical notes.Method: Individuals aged 8 to 26 years with a diagnosis of CP in the EHR between January 2009 and November 2020 (~12 years of data) were included in a crosssectional retrospective cohort of 2483 patients. The cohort was divided into traintest and validation groups. Positive predictive value, sensitivity, specificity, and area under the receiver operating curve (AUC) were calculated for prediction of ambulatory status and GMFCS levels.
Results:The median age was 15 years (interquartile range 10-20 years) for the total cohort, with 56% being male and 75% White. The validation group resulted in 70% sensitivity, 88% specificity, 81% positive predictive value, and 0.89 AUC for predicting ambulatory status. NLP applied to the EHR differentiated between GMFCS levels I-II and III (15% sensitivity, 96% specificity, 46% positive predictive value, and 0.71 AUC); and IV and V (81% sensitivity, 51% specificity, 70% positive predictive value, and 0.75 AUC).Interpretation: NLP applied to the EHR demonstrated excellent differentiation between ambulatory and non-ambulatory status, and good differentiation between GMFCS levels I-II and III, and IV and V. Clinical use of NLP may help to individualize functional characterization and management.