Among patients undergoing inpatient surgical procedures at VA medical centers, natural language processing analysis of electronic medical records to identify postoperative complications had higher sensitivity and lower specificity compared with patient safety indicators based on discharge coding.
Previous investigators have defined clinical interface terminology as a systematic collection of health care-related phrases (terms) that supports clinicians' entry of patient-related information into computer programs, such as clinical "note capture" and decision support tools. Interface terminologies also can facilitate display of computer-stored patient information to clinician-users. Interface terminologies "interface" between clinicians' own unfettered, colloquial conceptualizations of patient descriptors and the more structured, coded internal data elements used by specific health care application programs. The intended uses of a terminology determine its conceptual underpinnings, structure, and content. As a result, the desiderata for interface terminologies differ from desiderata for health care-related terminologies used for storage (e.g., SNOMED-CT), information retrieval (e.g., MeSH), and classification (e.g., ICD9-CM). Necessary but not sufficient attributes for an interface terminology include adequate synonym coverage, presence of relevant assertional knowledge, and a balance between pre- and post-coordination. To place interface terminologies in context, this article reviews historical goals and challenges of clinical terminology development in general and then focuses on the unique features of interface terminologies.
Background
The aim of this study was to build electronic algorithms using a combination of structured data and natural language processing (NLP) of text notes for potential safety surveillance of nine post-operative complications.
Methods
Post-operative complications from six medical centers in the Southeastern United States were obtained from the Veterans Affairs Surgical Quality Improvement Program (VASQIP) registry. Development and test datasets were constructed using stratification by facility and date of procedure for patients with and without complication. Algorithms were developed from VASQIP outcome definitions using NLP coded concepts, regular expressions, and structured data. The VASQIP nurse reviewer served as the reference standard for evaluating sensitivity and specificity. The algorithms were designed in the development and evaluated in the test dataset.
Results
Sensitivity and specificity in the test set were 85% and 92% for acute renal failure, 80% and 93% for sepsis, 56% and 94% for deep vein thrombosis, 80% and 97% for pulmonary embolism, 88% and 89% for acute myocardial infarction, 88% and 92% for cardiac arrest, 80% and 90% for pneumonia, 95% and 80% for urinary tract infection, and 80% and 93% for wound infection, respectively. A third of the complications occurred outside of the hospital setting.
Conclusions
Computer algorithms on data extracted from the electronic health record produced respectable sensitivity and specificity across a large sample of patients seen in six different medical centers. This study demonstrates the utility of combining natural language processing with structured data for mining the information contained within the electronic health record.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.