Background: The traditional outpatient model in hypertrophic cardiomyopathy (HCM) is under pressure. Population health management based on an accurate patient record provides an efficient, cost-effective alternative.
Methods: To improve the accuracy of the HCM patient list in a single hospital, we developed a rule-based information extraction natural language processing (NLP) framework. The framework employed ontological expansion of vocabulary and exclusion-first annotation, and received training by an 'expert in the loop'. The output stratified patients with atrial fibrillation (AF) and heart failure (HF), those without active cardiology care and likely screened individuals.
Results: The algorithm was validated against multiple data sources, including manual validation, for HCM, AF and HF and family history of the disease. Overall precision and recall were 0.854 and 0.865 respectively. The pipeline found 25,356 documents featuring HCM-related terms belonging to 11,083 patients. Excluding scanned documents resulted in 17,178 letters from 3,120 patients. Subsequent categorisation identified 1,753 real cases, of whom 357 had AF and 205 had HF. There were 696 likely screened individuals. Adjusting for 304 false-negative patients, the total HCM cohort was 2,045 patients. 214 were not under a cardiologist. NLP uncovered 709 patients who were absent in the registry or hospital disease codes.
Conclusion: This novel NLP framework generated a hospital-wide record of patients with HCM and defined various cohorts, including the small set of HCM patients lacking current cardiology input. Existing data sources inadequately described this population, spotlighting NLP's essential role for clinical teams planning to move to a population health management model of care.
Keywords: text mining, natural language processing, hypertrophic cardiomyopathy, population health management