AbStratObjective: D evelopment of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms.Design: The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain.Measurements: The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of f&r diseases were compared with the analysis of a panel of three physicians to determine recall and precision.Results: Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision J Am Med Informatics Assoc. 1994;1:161-174. Natural language is the most widespread, compre-
ObjectiveTo summarize literature describing approaches aimed at automatically identifying patients with a common phenotype.Materials and methodsWe performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included.ResultsNinety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients.DiscussionWe observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems.ConclusionsThere are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
Physicians disagreed on the interpretation of narrative reports, but this was not caused by outlier physicians or a consistent difference in the way internists and radiologists read reports. The natural language processor was not distinguishable from the physicians and was superior to all other comparison subjects. Although the domain of this study was restricted (six clinical conditions in chest radiographs), natural language processing seems to have the potential to extract clinical information from narrative reports in a manner that will support automated decision-support and clinical research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.