2022 ACM Conference on Fairness, Accountability, and Transparency 2022
DOI: 10.1145/3531146.3533199
|View full text |Cite
|
Sign up to set email alerts
|

Don’t Throw it Away! The Utility of Unlabeled Data in Fair Decision Making

Abstract: Decision making algorithms, in practice, are often trained on data that exhibits a variety of biases. Decision-makers often aim to take decisions based on some ground-truth target that is assumed or expected to be unbiased, i.e., equally distributed across socially salient groups. In many practical settings, the ground-truth cannot be directly observed, and instead, we have to rely on a biased proxy measure of the ground-truth, i.e., biased labels, in the data. In addition, data is often selectively labeled, i… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…In this section, we examine common interventions, including data augmentation, reward modification, optimization formulation, and system integration. For predictive models one of the simplest interventions is to collect or generate more data (data augmentation), under the assumption that as more samples are acquired the training set will improve its approximation of the true distribution and the model will learn a more accurate, representative function [176,191,204,227,241,252]. This practice works well if data deficiency is the only reason for poor performance.…”
Section: Mitigationsmentioning
confidence: 99%
“…In this section, we examine common interventions, including data augmentation, reward modification, optimization formulation, and system integration. For predictive models one of the simplest interventions is to collect or generate more data (data augmentation), under the assumption that as more samples are acquired the training set will improve its approximation of the true distribution and the model will learn a more accurate, representative function [176,191,204,227,241,252]. This practice works well if data deficiency is the only reason for poor performance.…”
Section: Mitigationsmentioning
confidence: 99%
“…However, their method requires non-trivial parametric assumptions on the feature distribution. Rateike et al [31] develop an online process that first learns an unbiased representation of the data and then trains an online classifier over the learned representation space. This approach and the above papers, however, do not employ any constraint on false positives which can lead to low utility in certain iterations when sufficient information for learning is unavailable (see Section 5 for empirical comparison against these methods).…”
Section: Related Workmentioning
confidence: 99%
“…Baselines. We compare our approach to the following baselines: (a) KILBERTUS ET AL [22], which uses stochastic classifiers to assign a non-zero exploration probability to every sample; (b) YANG ET AL [38], which employs a bandit-type approach, first determining the likelihoods using a logistic model and then adjusting classifier thresholds to incorporate gathered information; (c) RATEIKE ET AL [31], which learns an unbiased representation of the data using which an online decisionmaking model is trained; (d) OPT-OFFLINE, i.e. the ideal (unattainable in partial feedback setting) classifier trained using i.i.d.…”
Section: Adult Income Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…Classification models that can handle missing data have been studied in the previous literature with the goal of minimizing costs or increasing performance [44,2], obtaining uncertainty estimates [20], or fulfilling classical fairness notions [45,19,40,11]. In a related line of work, classification with noisy [12] or missing labels [22,35] has been investigated, where the missingness is often a result of selection bias. The setting considered in this work is different in the sense that we are not concerned with fulfilling a fairness notion with respect to sensitive information, but between subjects with and without optional information.…”
Section: Related Workmentioning
confidence: 99%