Reusing ontologies and their terms is a principle and best practice that most ontology development methodologies strongly encourage. Reuse comes with the promise to support the semantic interoperability and to reduce engineering costs. In this paper, we present a descriptive study of the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze different types of reuse and overlap constructs. While we find an approximate term overlap between 25-31%, the term reuse is only <9%, with most ontologies reusing fewer than 5% of their terms from a small set of popular ontologies. Clustering analysis shows that the terms reused by a common set of ontologies have >90% semantic similarity, hinting that ontology developers tend to reuse terms that are sibling or parent-child nodes. We validate this finding by analysing the logs generated from a Protégé plugin that enables developers to reuse terms from BioPortal. We find most reuse constructs were 2-level subtrees on the higher levels of the class hierarchy. We developed a Web application that visualizes reuse dependencies and overlap among ontologies, and that proposes similar terms from BioPortal for a term of interest. We also identified a set of error patterns that indicate that ontology developers did intend to reuse terms from other ontologies, but that they were using different and sometimes incorrect representations. Our results stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.
KeywordsDescriptive Study; Ontologies; Biomedical Domain; Term Reuse; Term Overlap; Composite Mappings; Visualization
Reuse in biomedical ontologiesThe biomedical research community has been one of the earliest adopters of ontologies to tackle the challenges of efficient knowledge organization, optimized information retrieval and effective annotation of datasets. Researchers have used ontologies for various purposes such as knowledge management, semantic search, data annotation, data integration,
Author Manuscript Author ManuscriptAuthor ManuscriptAuthor Manuscript exchange, decision support and reasoning [1,2]. For example, i) the National Cancer Institute Thesaurus (NCIT) has been used as a reference terminology for cancer data [3], ii) the Gene Ontology (GO) has been ubiquitously used for enrichment analysis on gene sets obtained from microarray experiments [4], and iii) the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) has been used for the electronic exchange of clinical health information [5].Over the years, ontology development has become a reuse-centric process [6,7]. All methodologies strongly encourage reuse while building new ontologies, be it at the level of an ontology, or at the level of individual terms [8,9]. In the literature, we may find two areas that benefit from reuse: i) ontology engineering, in which experts can reuse already existing ontology struct...