Towards similarity-based differential diagnostics for common diseases

Russell

Makepeace

et al. 2021

Preprint

Self Cite

Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction, by leveraging the wealth of background knowledge provided by biomedical ontologies. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or methods in the area. In this work, we develop a reproducible platform for benchmarking experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from text narrative associated with admissions in MIMIC-III. In doing this, we identify and interpret the performance of a large number of semantic similarity measures for this task, and provide a basis for further research on related tasks in the area.

Section: Discussionmentioning

confidence: 89%

Section: Discussionmentioning

confidence: 99%

Section: Semantic Comparison Of Phenotype Profiles and Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating Semantic Similarity Methods for Comparison of Text-derived Phenotype Profiles

Russell

Makepeace

et al. 2021

Preprint

Self Cite

“…With respect to clustering, what remains to be done, is to determine a method that enables the re-integration of these scores, clusters, and groupings, into a single representation that minimises loss of information. Such an approach could lead to powerful insights into multi-and co-morbidity, as well as to improve semantic similarity based classification using text-derived phenotypes, such as that described by our previous work [11]. One method of approaching this specific to clustering problems could be the consideration of Multi-View Clustering, which could consider each measure of facet-wise similarity as a different view of the same patient admission [54], which could then be further analysed to determine an optimal set of clusters.…”

Section: Discussionmentioning

confidence: 99%

Multi-faceted Semantic Clustering With Text-derived Phenotypes

Williams

Karwath

et al. 2021

Preprint

Self Cite

Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes.

“…Ontology-based analysis has been leveraged across many tasks including prediction of protein interaction and rare disease variants [1]. In the clinical space, similar analysis methods have been applied across a wide range of applications including diagnosis of rare and common diseases [2,3], as well as the identification of subtypes of diseases, such as autism [4]. In addition, the synthesis of ontology-based methods and machine learning is increasingly common [5].…”

Section: Introductionmentioning

confidence: 99%

Klarigi: Characteristic Explanations for Semantic Data

Williams

Karwath

et al. 2021

Preprint

Self Cite

Semantic annotation facilitates the use of background knowledge in analysis. This includes approaches that sort entities into groups, clusters, or assign labels or outcomes that are typically difficult to derive semantic explanations for. We introduce Klarigi, a tool that creates semantic explanations for groups of entities described by ontology terms implemented in a manner that balances multiple scoring heuristics. We demonstrate Klarigi by using it to identify characteristic terms for text-derived phenotypes of emergency admissions for two frequently conflated diagnoses, pulmonary embolism and pneumonia. Klarigi provides a universal method by which entity groups or labels can be explained semantically, and thus contributes to improved explainability of analysis methods.