“…Recognition with humans in the loop. Among the most similar works to ours is the approaches which combine computer vision with human-in-the-loop collaboration for tasks such as fine-grained image classification [6,59,12,60], image segmentation [26], attribute-based classification [32,40,3], image clustering [34], image annotation [54,55,47], and human interaction [31] and object annotation in videos [58]. Methods such as [6,59,12,60] jointly model human and computer uncertainty and characterize human time versus annotation accuracy, but only incorporate a single type of human response.…”