The increasing use of ontologies requires their quality assurance. Ontology quality assurance consists of a set of activities that allow analyzing the ontology, identifying strengths and weaknesses, and proposing improvement actions. Human readability is a quality aspect that improves the use and reuse of ontologies. The human readable content refers to the natural language content consumed by humans and by the growing number of embedding methods applied to ontologies. The ontology community has proposed best practices related to human readability, but there is no standardized framework for its assessment. We aim to provide a framework for analyzing the human readability based on quantitative metrics for supporting ontology developers' decisions. We present the HURON framework, which consists of the specification of five quantitative metrics related to the human readability of ontology content, and a software tool to implement them. The metrics take into account the number of names, descriptions, or synonyms, and also assess the application of systematic naming conventions and the 'lexically suggest, lexically define' principle. Target values are provided for each metric to help to interpret the values. HURON can also be used to assess compliance with best practices. We have applied HURON to a representative set of biomedical ontologies, the OBO Foundry repository. The results showed that, in general, the OBO Foundry ontologies comply with the expected number of descriptions and names in their classes, and both lexical and semantically formalized contents are aligned. However, most of the ontologies did not follow a systematic naming convention. In general, the ontologies of this repository show adherence to some of the best practices, although areas of improvement were identified. A number of recommendations are made for ontology developers and users.
INDEX TERMS knowledge engineering, ontologies, quality assurance, readability metrics, semantic web
I. INTRODUCTIONOntologies play a key role in knowledge engineering by providing a common conceptualization of a domain. Ontologies have been successfully applied in different domains, but especially in biology and biomedicine, with different purposes [1][2][3][4][5].At the time of writing, repositories like BioPortal [6] had more than 1,000 ontologies and both the Open Biological and Biomedical Ontology (OBO) Foundry [7] and the Ontology Lookup Service (OLS) [8] had more than 250 ontologies. The number of ontologies in these repositories is continuously increasing, which demonstrates their relevance and impact.In contrast to other artifacts used in data management systems, such as relational databases, developed specifically for particular applications, ontologies should be created in a standardized way to facilitate their reuse. The sharing and reuse orientation of ontologies have also made them fundamental for obtaining Findable, Accessible, Interoperable, and Reusable (FAIR) datasets [9]. As a consequence, assuring the quality of ontologies has become a major need.