No abstract
Automatically assigning tasks to people is challenging because human performance can vary across tasks for many reasons. This challenge is further compounded in real-life settings in which no oracle exists to assess the quality of human decisions and task assignments made. Instead, we find ourselves in a "closed" decision-making loop in which the same fallible human decisions we rely on in practice must also be used to guide task allocation. How can imperfect and potentially biased human decisions train an accurate allocation model? Our key insight is to exploit weak prior information on human-task similarity to bootstrap model training. We show that the use of such a weak prior can improve task allocation accuracy, even when human decision-makers are fallible and biased. We present both theoretical analysis and empirical evaluation over synthetic data and a social media toxicity detection task. Results demonstrate the efficacy of our approach.
Studies have shown that the people depicted in image search results tend to be of majority groups with respect to socially salient attributes such as gender or race. This skew goes beyond that which already exists in the world - i.e., the search results for images of people are more imbalanced than the ground truth would suggest. For example, Kay et al. showed that although 28% of CEOs in the U.S. are women, only 10% of the top 100 results for "CEO" in Google Image Search are women. Similar observations abound across search terms and across socially salient attributes. Most existing approaches to correct for this kind of bias assume that the images of people include labels denoting the relevant socially salient attributes. These labels are explicitly used to either change the dataset, adjust the training of the algorithm, and/or in the execution of the method. However, such labels are often unknown. Further, using machine learning techniques to infer these labels may often not be possible within acceptable accuracy ranges and may not be desirable due to the additional biases this process could incur. As observed in prior work, alternate approaches consider the diversity of image features, which often do not translate to images of visibly diverse people. We develop a novel approach that takes as input a visibly diverse control set of images of people and uses this set as part of a procedure to select a set of images of people in response to a query. The goal is to have a resulting set that is more visibly diverse in a manner that emulates the diversity depicted in the control set. It accomplishes this by evaluating the similarity of the images selected by a black-box algorithm with the images in the diversity control set, and incorporating this "diversity score" into the final selection process. Importantly, this approach does not require images to be labelled at any point; effectively, it gives a way to implicitly diversify the set of images selected. We provide two variants of our approach: the first is a modification of the well known MMR algorithm to incorporate the diversity scores, and the second is a more efficient variant that does not consider within-list redundancy. We evaluate these approaches empirically on two image datasets: 1) a new dataset we collect which contains the top 100 Google Image results for 96 occupations, for which we evaluate gender and skin-tone diversity with respect to occupations and 2) the well-known CelebA dataset containing images of celebrities for which we can evaluate gender diversity with respect to facial features such as "smiling" or "glasses". Both of our approaches produce image sets that significantly improve the visible diversity of the results (i.e., include a larger fraction of anti-stereotypical images) with respect to current Google Image Search results and other state-of-the-art algorithms for diverse image summarization. Further, they seem to come at a minimal cost to accuracy. These empirical results demonstrate the effectiveness of simple label-independent interventions to diversify image search.
Machine learning models are often implemented in cohort with humans in the pipeline, with the model having an option to defer to a domain expert in cases where it has low confidence in its inference. Our goal is to design mechanisms for ensuring accuracy and fairness in such prediction systems that combine machine learning model inferences and domain expert predictions. Prior work on "deferral systems" in classification settings has focused on the setting of a pipeline with a single expert and aimed to accommodate the inaccuracies and biases of this expert to simultaneously learn an inference model and a deferral system. Our work extends this framework to settings where multiple experts are available, with each expert having their own domain of expertise and biases. We propose a framework that simultaneously learns a classifier and a deferral system, with the deferral system choosing to defer to one or more human experts in cases of input where the classifier has low confidence. We test our framework on a synthetic dataset and a content moderation dataset with biased synthetic experts, and show that it significantly improves the accuracy and fairness of the final predictions, compared to the baselines. We also collect crowdsourced labels for the content moderation task to construct a real-world dataset for the evaluation of hybrid machine-human frameworks and show that our proposed learning framework outperforms baselines on this real-world dataset as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.