Medical language is at the heart of the electronic health record (EHR), with up to 70 percent of the information in that record being recorded in the natural language, free-text portion. In moving from paper medical records to EHRs, we have opened up opportunities for the reuse of this clinical information through automated search and analysis. Natural language, however, is challenging for computational methods. This paper examines the tension between the nuanced, qualitative nature of medical language and the logical, structured nature of computation as well as the way in which these have interacted with each other through the medium of the EHR. The paper also examines the potential for the computational analysis of natural language to overcome this tension.
IntroductionThe past few decades have seen a shift away from paper-based medical records towards computerized electronic health records (EHRs). Whereas paper-based records had their roots in a largely textual representation, the digital nature of computers lends itself more readily to the structuring and organization of data. The shift to the EHR has therefore been accompanied by a pressure on clinicians to record patient information in a structured way by choosing options such as diagnosis, medications, and symptoms from lists and completing onscreen forms. Structured information is computationally tractable, unlike the natural language of the textual portion of the record. Structured information, it is argued, can be reused to support research, audit, and the clinical process [1].