Thesauri and other types of controlled vocabularies are increasingly re‐engineered into ontologies described using the Web Ontology Language (OWL), particularly in the life sciences. This has led to the perception by some that thesauri are ontologies once they are described by using the syntax of OWL while others have emphasized the need to re‐engineer a vocabulary to use it as ontology. This confusion is rooted in different perceptions of what ontologies are and how they differ from other types of vocabularies. In this article, we rigorously examine the structural differences and similarities between thesauri and meaning‐defining ontologies described in OWL. Specifically, we conduct (a) a conceptual comparison of thesauri and ontologies, and (b) a comparison of a specific thesaurus and a specific ontology in the same subject field. Our results show that thesauri and ontologies need to be treated as 2 orthogonal kinds of models with superficially similar structures. An ontology is not a good thesaurus, nor is a thesaurus a good ontology. A thesaurus requires significant structural and other content changes to become an ontology, and vice versa.
The re-engineering of vocabularies into ontologies can save considerable time in the development of ontologies. Current methods that guide the re-engineering of thesauri into ontologies often convert vocabularies syntactically only and ignore the problems that stems from interpreting vocabularies as statements of truth (ontologies). Current reengineering methods also do not make use of the semantic capabilities of formal languages like OWL in order to detect logical mistakes and to improve vocabularies. In this paper, we introduce a content-focused method for building domain-specific ontologies based on a thesaurus, a popular type of vocabulary. The method results in a semantically adequate ontology that does not only contain a semantically rich description of the entities to be modeled, but also enables non-trivial consistency checks and classifications based on automated reasoning, and can be integrated with other ontologies following the same development principles. The identification of membership conditions, the alignment to a top-level ontology and formal relations, and the consistency check and inference using a reasoner are the central steps in our method. We explain the motivation and sub-activities for each of these steps and illustrate their application through a case study in the domain of agricultural fertilizers based on the ACROVOC Thesaurus. Foremost, our method shows that simple syntactic conversions are insufficient to derive an ontology from a thesaurus. Instead, considerable structural changes are required to derive an ontology that corresponds to the reality it represents. Our method relies on a manual development effort and is particularly useful where a highly reliable is-a hierarchy is crucial.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.