In modern electronic medical records (EMR) much of the clinically important data -signs and symptoms, symptom severity, disease status, etc. -are not provided in structured data fields, but rather are encoded in clinician generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we will review briefly. Additionally, challenges facing current research efforts in biomedical NLP include the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.
IntroductionIn modern electronic medical records (EMR) most of the clinically important data -signs and symptoms, symptom severity, disease status, etc. -is not provided in structured data fields, but are rather encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source, converting unstructured text to structured, actionable data for use in applications for clinical decision support, quality assurance, and public health surveillance. There are currently many NLP systems that have been 2 successfully applied to biomedical text. It is not our goal to review all of them in this chapter, but rather to provide an overview of how the field evolved from producing monolithic software built on platforms that were available at the time they were developed to contemporary component-based systems built on top of general frameworks. More importantly, the performance of these systems is tightly associated with their "ingredients" (i.e., modules that are used to form its background knowledge), and how these modules are combined on top of the general framework. We highlight certain systems based on their landmark status as well as on the diversity of components and frameworks they are based on. [7]. The review in this chapter differs from previous work in that it emphasizes the historical development of landmark clinical NLP systems, and presents each system in light of a unified system architecture.We consider that each NLP system in biomedicine contains two main components: biomedical background knowledge and a framework that integrates NLP tools. In the rest of this paper, we will first outline our model architecture for NLP systems in biomedicine, before going on to review and summarize representative NLP systems, starting with an early NLP system, LSP-MLP, and closing our discussion with the presentation of a more recent system, cTAKES. Finally, we will discuss...