Abstract-We propose a new cluster-based semantic similarity/distance measure for the biomedical domain within the framework of UMLS. The proposed measure is based mainly on the cross-modified path length feature between the concept nodes, and two new features: (1) the common specificity of two concept nodes, and (2) the local granularity of the clusters. We also applied, for comparison purpose, five existing general English ontology-based similarity measures into the biomedical domain within UMLS. The proposed measure was evaluated relative to human experts' ratings, and compared with the existing techniques using two ontologies (MeSH and SNOMED-CT) in UMLS. The experimental results confirmed the efficiency of the proposed method, and showed that our similarity measure gives the best overall results of correlation with human ratings. We show, further, that using MeSH ontology produces better semantic correlations with human experts' scores than SNOMED-CT in all of the tested measures.
We propose a new cluster-based semantic similarity/distance measure for the biomedical domain within the framework of UMLS. The proposed measure is based mainly on the cross-modified path length feature between the concept nodes, and two new features: (1) the common specificity of two concept nodes, and (2) the local granularity of the clusters. We also applied, for comparison purpose, five existing general English ontology-based similarity measures into the biomedical domain within UMLS. The proposed measure was evaluated relative to human experts' ratings, and compared with the existing techniques using two ontologies (MeSH and SNOMED-CT) in UMLS. The experimental results confirmed the efficiency of the proposed method, and showed that our similarity measure gives the best overall results of correlation with human ratings. We show, further, that using MeSH ontology produces better semantic correlations with human experts' scores than SNOMED-CT in all of the tested measures.
The semantic similarity techniques are interested in determining how much two concepts, or terms, are similar according to a given ontology. This paper proposes a method for measuring semantic similarity/distance between terms. The measure combines strengths and complements weaknesses of existing measures that use ontology as primary source. The proposed measure uses a new feature of common specificity (CSpec) besides the path length feature. The CSpec feature is derived from (1)_information content of concepts, and (2) information content of the ontology given a corpus. We evaluated the proposed measure with benchmark test set of term pairs scored for similarity by human experts. The experimental results demonstrated that our similarity measure is effective and outperforms the existing measures. The proposed semantic similarity measure gives the best correlation (0.874) with human scores in the benchmark test set compared to the existing measures.
Finding the similarity between biomedical terms and concepts is a very important task for biomedical information extraction and knowledge discovery. We propose and investigate the feasibility of using MEDLINE as standard corpus and MeSH ontology for measuring semantic similarity between concepts in the biomedical domain within UMLS framework. We adapted information-based semantic similarity measures from general English and applied them into the biomedical domain to measure the similarity between biomedical terms. The experimental results show that, by using MEDLINE and MeSH ontology, the information-based similarity measures perform very well and produce high correlations with human ratings. The similarity measure of Jiang & Conrath achieved 82% correlation with human similarity scores, and the average correlation with human scores of three measures is approaching 78%. These results confirm that MEDLINE is an effective information source for measuring semantic similarity between biomedical terms and concepts.
This paper presents a cross-ontology approach, as an extension of the Cluster-Based approach, to measure semantic distance between concepts within single ontology or between concepts dispersed in multiple ontologies in a unified framework in the biomedical domain.The approach was evaluated in the biomedical domain within the UMLS framework with two biomedical ontologies (MeSH and SNOMED-CT). We used two datasets of biomedical terms scored for similarity by human experts. The experimental results (with ~0.81 correlation with human scores) confirmed that the proposed approach is effective and has great potential in measuring semantic distance using multiple ontologies in a unified framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.