Towards comprehensive syntactic and semantic annotations of the clinical narrative

Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William F.; Warner, Colin; Hwang, Jena D.; Choi, Jinho D.; Dligach, Dmitriy; Nielsen, Rodney D.; Martin, James H.; Ward, Wayne H.; Palmer, Martha; Savova, Guergana

doi:10.1136/amiajnl-2012-001317

Cited by 109 publications

(108 citation statements)

References 18 publications

Supporting

Mentioning

104

Contrasting

Order By: Relevance

“…We computed our IAA values requiring an exact match between annotations, which is generally lower than a partial match. For example, Albright et al (2013) achieved an F1 measure of 0.697 in exact match, but of 0.750 in partial match. Overall, our Ogren et al (2008) for English (from 75.7 to 81.4% in entity annotation, exact match) and Oronoz et al (2015) for Spanish (from 88.63 to 90.53% in term annotation).…”

Section: Inter-annotator Agreement (Iaa)mentioning

confidence: 99%

“…Notable research initiatives, in collaboration with health institutions, have annotated clinical texts: the Mayo Clinic corpus (Ogren et al 2008), the Clinical E-Science Framework (CLEF) (Roberts et al 2009), the THYME (Temporal Histories of Your Medical Events) project (Styler et al 2014), 7 the SHARP Template Annotations (Savova et al 2012), the MiPACQ (Multi-source Integrated Platform for Answering Clinical Questions) (Albright et al 2013), the IxA-Med-GS (Oronoz et al 2015) or the Harvey corpus (Savkov et al 2016). Research challenges have also fuelled the annotation of resources or enrichment of available texts.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

Campillos

Deléger

Grouin

et al. 2017

Lang Resources & Evaluation

View full text Add to dashboard Cite

Quality annotated resources are essential for Natural Language Processing. The objective of this work is to present a corpus of clinical narratives in French annotated for linguistic, semantic and structural information, aimed at clinical information extraction. Six annotators contributed to the corpus annotation, using a comprehensive annotation scheme covering 21 entities, 11 attributes and 37 relations. All annotators trained on a small, common portion of the corpus before proceeding independently. An automatic tool was used to produce entity and attribute pre-annotations. About a tenth of the corpus was doubly annotated and annotation differences were resolved in consensus meetings. To ensure annotation consistency throughout the corpus, we devised harmonization tools to automatically identify annotation differences to be addressed to improve the overall corpus quality. The annotation project spanned over 24 months and resulted in a corpus comprising 500 documents (148,476 tokens) annotated with 44,740 entities and 26,478 relations. The average inter-annotator agreement is 0.793 F-measure for entities and 0.789 for relations. The performance of the pre-annotation tool for entities reached 0.814 F-measure when sufficient training data was available. The performance of our entity pre-annotation tool shows the value of the corpus to build and evaluate information extraction methods. In addition, we introduced harmonization methods that further improved the quality of annotations in the corpus.Electronic supplementary material The online version of this article

show abstract

Section: Inter-annotator Agreement (Iaa)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

Campillos

Deléger

Grouin

et al. 2017

Lang Resources & Evaluation

View full text Add to dashboard Cite

show abstract

“…This problem is exacerbated in the biomedical domain, where suitably qualified annotators can be both hard to find and prohibitively expensive [48,49].…”

Section: Discussionmentioning

confidence: 99%

Natural Language Processing in Biomedicine: A Unified System Architecture Overview

Doan

Conway

Phương³

et al. 2014

Methods in Molecular Biology

View full text Add to dashboard Cite

In modern electronic medical records (EMR) much of the clinically important data -signs and symptoms, symptom severity, disease status, etc. -are not provided in structured data fields, but rather are encoded in clinician generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we will review briefly. Additionally, challenges facing current research efforts in biomedical NLP include the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge. IntroductionIn modern electronic medical records (EMR) most of the clinically important data -signs and symptoms, symptom severity, disease status, etc. -is not provided in structured data fields, but are rather encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source, converting unstructured text to structured, actionable data for use in applications for clinical decision support, quality assurance, and public health surveillance. There are currently many NLP systems that have been 2 successfully applied to biomedical text. It is not our goal to review all of them in this chapter, but rather to provide an overview of how the field evolved from producing monolithic software built on platforms that were available at the time they were developed to contemporary component-based systems built on top of general frameworks. More importantly, the performance of these systems is tightly associated with their "ingredients" (i.e., modules that are used to form its background knowledge), and how these modules are combined on top of the general framework. We highlight certain systems based on their landmark status as well as on the diversity of components and frameworks they are based on. [7]. The review in this chapter differs from previous work in that it emphasizes the historical development of landmark clinical NLP systems, and presents each system in light of a unified system architecture.We consider that each NLP system in biomedicine contains two main components: biomedical background knowledge and a framework that integrates NLP tools. In the rest of this paper, we will first outline our model architecture for NLP systems in biomedicine, before going on to review and summarize representative NLP systems, starting with an early NLP system, LSP-MLP, and closing our discussion with the presentation of a more recent system, cTAKES. Finally, we will discuss...

show abstract

“…However, this ongoing work on temporal evaluation is based on language data collected from the news. In the clinical domain, (Styler IV et al, Undated;Palmer and Pustejovsky, 2012;Albright et al, 2013) describe the THYME annotation project. The scope and language of temporality related to the cell cycle is different from that of both TempEval and the clinical domain, and supports (and demands) different types of reasoning, specifically related to cyclical time.…”

Section: Motivationmentioning

confidence: 99%

Temporal Expression Recognition for Cell Cycle Phase Concepts in Biomedical Literature

Hailu

Panteleyeva

Cohen

2014

Proceedings of BioNLP 2014

View full text Add to dashboard Cite

In this paper, we present a system for recognizing temporal expressions related to cell cycle phase (CCP) concepts in biomedical literature. We identified 11 classes of cell cycle related temporal expressions, for which we made extensions to TIMEX3, arranging them in an ontology derived from the Gene Ontology. We annotated 310 abstracts from PubMed. Annotation guidelines were developed, consistent with existing time-related annotation guidelines for TimeML. Two annotators participated in the annotation. We achieved an inter-annotator agreement of 0.79 for an exact span match and 0.82 for relaxed constraints. Our approach is a hybrid of machine learning to recognize temporal expressions and a rule-based approach to map them to the ontology. We trained a named entity recognizer using Conditional Random Fields (CRF) models. An off-the-shelf implementation of the linear chain CRF model was used. We obtained an F-score of 0.77 for temporal expression recognition. We achieved 0.79 macro-averagee F-score and 0.78 microaveraged F-score for mapping to the ontology.

show abstract

Towards comprehensive syntactic and semantic annotations of the clinical narrative

Cited by 109 publications

References 18 publications

A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

Natural Language Processing in Biomedicine: A Unified System Architecture Overview

Temporal Expression Recognition for Cell Cycle Phase Concepts in Biomedical Literature

Contact Info

Product

Resources

About