The success of artificial intelligence (AI) applications is heavily dependant on the quality of data they rely on. Thus, data curation, dealing with cleaning, organising and managing data, has become a significant research area to be addressed. Increasingly, semantic data structures such as ontologies and knowledge graphs empower the new generation of AI systems. In this paper, we focus on ontologies, as a special type of data. Ontologies are conceptual data structures representing a domain of interest and are often used as a backbone to knowledge-based intelligent systems or as an additional input for machine learning algorithms. Low-quality ontologies, containing incorrectly represented information or controversial concepts modelled from a single viewpoint can lead to invalid application outputs and biased systems. Thus, we focus on the curation of ontologies as a crucial factor for ensuring trust in the enabled AI systems. While some ontology quality aspects can be automatically evaluated, others require a human-in-the-loop evaluation. Yet, despite the importance of the field several ontology quality aspects have not yet been addressed and there is a lack of guidelines for optimal design of human computation tasks to perform such evaluations. In this paper, we advance the state-of-the-art by making two novel contributions: First, we propose a human-computation (HC)-based approach for the
verification of ontology restrictions
- an ontology evaluation aspect that has not yet been addressed with HC techniques. Second, by performing two controlled experiments with a junior expert crowd, we empirically derive task design guidelines for achieving high-quality evaluation results related to i) the
formalism for representing ontology axioms
and ii)
crowd qualification testing
. We find that the representation format of the ontology does not significantly influence the campaign results, nevertheless, contributors expressed a preference in working with a graphical ontology representation. Additionally we show that an objective qualification test is better fitted at assessing contributors’ prior knowledge rather than a subjective self-assessment and that prior modelling knowledge of the contributors had a positive effect on their judgements. We make all artefacts designed and used in the experimental campaign publicly available.