Objective: Extracting and accurately phenotyping electronic health documentation is critical for medical research and clinical care. While there are a variety of techniques to accomplish this task, natural language processing (NLP) has been developed for numerous domains to transform clinical documentation into data available for computational work. Accordingly, we sought to develop a highly accurate and open-source NLP module to ascertain and phenotype left ventricular hypertrophy (LVH) and hypertrophic cardiomyopathy (HCM) diagnoses on echocardiogram reports from a diverse hospital network. Methods: 700 echocardiogram reports from six hospitals were randomly selected from data repositories within the Mass General Brigham healthcare system and manually adjudicated by physicians for 10 subtypes of LVH and diagnoses of HCM. Using an open-source NLP system, the module was developed on 300 training set reports and validated on 400 reports. The sensitivity, specificity, positive predictive value, and negative predictive value were calculated to assess the discriminative accuracy of the NLP module. Results: The NLP demonstrated robust performance across the 10 LVH subtypes with overall sensitivity and specificity exceeding 96%. Additionally, the NLP module demonstrated excellent performance detecting HCM diagnoses, with sensitivity and specificity exceeding 93%. Conclusion: We designed a highly accurate NLP module to determine the presence of LVH and HCM on echocardiogram reports. Our work demonstrates the feasibility of NLP to detect diagnoses on imaging reports, even when described in free-text. These modules have been placed in the public domain to advance research, trial recruitment, and population health management for individuals with LVH-associated conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.