“…We offer our work as a valuable foundation for improving VQA services, by empowering system designers and users to know how to prevent, interpret, or resolve answer differences. Specifically, a solution that anticipates why a visual question will lead to different answers (summarized in Figure 1) could (1) help users identify how to modify their visual question in order to arrive at a single, unambiguous answer; e.g., retake an image when it is low quality or does not show the answer versus modify the question when it is ambiguous or invalid; (2) increase users' awareness for what reasons, if any, trigger answer differences when they are given a single answer; or (3) reveal how to automatically aggregate different answers [2,19,24,26,43] when multiple answers are collected.…”