Improved characterisation of clinical text through ontology-based vocabulary expansion

Karwath

Williams

et al. 2021

Preprint

Self Cite

Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. The results reveal a potentially powerful approach that can be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.

Section: Resultsmentioning

confidence: 99%

Towards Similarity-based Differential Diagnostics For Common Diseases

Karwath

Williams

et al. 2021

Preprint

Self Cite

“…Both the construction of the ontology, and thereby analysis performance could be affected by investigating the use of other Komenti features, particularly those for negation detection and synonym expansion [25, 26]. In the former case, the evaluation of negation could prevent incorrect or explicitly negated facts from being used in the produced knowledgebase, while the latter case has been shown to improve overall characterisation of text.…”

Section: Discussionmentioning

confidence: 99%

Exploring Binary Relations for Ontology Extension and Improved Adaptation to Clinical Text

Hoehndorf

Karwath

et al. 2020

Preprint

Self Cite

BackgroundThe controlled domain vocabularies provided by ontologies make them an indispensable tool for text mining. Ontologies also include semantic features in the form of taxonomy and axioms, which make annotated entities in text corpora useful for semantic analysis. Extending those semantic features may improve performance for characterisation and analytic tasks. Ontology learning techniques have previously been explored for novel ontology construction from text, though most recent approaches have focused on literature, with applications in information retrieval or human interaction tasks. We hypothesise that extension of existing ontologies using information mined from clinical narrative text may help to adapt those ontologies such that they better characterise those texts, and lead to improved classification performance.ResultsWe develop and present a framework for identifying new classes in text corpora, which can be integrated into existing ontology hierarchies. To do this, we employ the Stanford Open Information Extraction algorithm and integrate its implementation into the Komenti semantic text mining framework. To identify whether our approach leads to better characterisation of text, we present a case study, using the method to learn an adaptation to the Disease Ontology using text associated with a sample of 1,000 patient visits from the MIMIC-III critical care database. We use the adapted ontology to annotate and classify shared first diagnosis on patient visits with semantic similarity, revealing an improved performance over use of the base Disease Ontology on the set of visits the ontology was constructed from. Moreover, we show that the adapted ontology also improved performance for the same task over two additional unseen samples of 1,000 and 2,500 patient visits.ConclusionsWe report a promising new method for ontology learning and extension from text. We demonstrate that we can successfully use the method to adapt an existing ontology to a textual dataset, improving its ability to characterise the dataset, and leading to improved analytic performance, even on unseen portions of the dataset.

“…Komenti also includes a novel vocabulary expansion algorithm, adding additional labels and synonyms for terms that can be matched in text, by linking equivalent classes between ontologies using lexical and semantic queries. This has provisionally been shown to vastly increase the scale of vocabulary available in several ontologies, the amount of information returned in information retrieval tasks, and to improve the performance of semantic analysis of clinical text [8].…”

Section: Approachmentioning

confidence: 99%

“…Since Komenti outputs annotations in a simple tabular format, analysis software can easily make use of the produced information. In a previous experiment, labels derived by Komenti from clinical text were used in semantic similarity analyses [8]. Komenti also provides several features for internal analyses of its annotations.…”

Section: Approachmentioning

confidence: 99%

“…No informed consent was required as this was a service improvement project, and the documents were not de-identified as we intend to follow up individuals lost to discharge. The hospital identification number linked documents belonging to the same patient and associated data to the registry [8]. Data remained within the Trust and only included information related to HCM.…”

Section: Ethics and Fundingmentioning

confidence: 99%

See 1 more Smart Citation

Komenti: A semantic text mining framework

Bradlow

Hoehndorf

et al. 2020

Preprint

Self Cite

Komenti is a reasoner-enabled semantic query and information extraction tool. It is the only text mining tool that enables querying inferred knowledge from biomedical ontologies. It also contains multiple novel components for vocabulary construction and context disambiguation, which can improve the power of text mining and ontology-based analysis tasks, with a view towards making full use of the semantic provision of biomedical ontologies in the text extraction and characterisation space. Here, we describe Komenti, its features, and a use case wherein we automate a clinical audit process, classifying the medications of patients with hypertrophic cardiomyopathy from text records, revealing a high precision, and a subcohort of candidate patients who have atrial fibrillation but were not anti-coagulated, and are therefore at a higher risk of stroke.