Information Content(IC) is an important dimensionof assessing the semantic similarity between two terms or word senses in word knowledge. The conventional method of obtaining IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with actual usage in text as derived from a large corpus. In this paper, a new model of IC is presented, which relies on hierarchical structure alone. The model considers not only the hyponyms of each word sense but also its depth in the structure. The IC value is easier to calculate based on our model, and when used as the basis of a similarity approach it yields judgments that correlate more closely with human assessments than others, which using IC value obtained only considering the hyponyms and IC value got by employing corpus analysis.
This paper presents a new model of measuring semantic similarity in the taxonomy of WordNet. The model takes the path length between two concepts and IC value of each concept as its metric, furthermore, the weight of two metrics can be adapted artificially. In order to evaluate our model, traditional and widely used datasets are used. Firstly, coefficients of correlation between human ratings of similarity and six computational models are calculated, the result shows our new model outperforms their homologues. Then, the distribution graphs of similarity value of 65 word pairs are discussed our model having no faulted zone more centralized than other five methods. So our model can make up the insufficient of other methods which only using one metric(path length or IC value) in their model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.