Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of Artificial Intelligence (AI) in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotations at scale. To investigate this issue, we conducted a formative study involving eight individuals from the field of disaster management, each possessing varying levels of expertise. The goal was to understand the prevalent factors contributing to disagreements among annotators when classifying Twitter messages related to disasters and to analyze their respective responses. Our analysis identified two primary causes of disagreement between expert and beginner annotators: 1) a lack of contextual knowledge or uncertainty about the situation, and 2) the absence of visual or supplementary cues. Based on these findings, we designed a Context interface, which generates aids that help beginners identify potential mistakes and provide the hidden context of the presented tweet. The summative study compares Context design with two widely used designs in data annotation UI, Highlight and Reasoningbased interfaces. We found significant differences between these designs in terms of attitudinal and behavioral data. We conclude with implications for designing future interfaces aiming at closing the knowledge gap among annotators.
CCS CONCEPTS• Human-centered computing → Empirical studies in interaction design; User interface design.