2022
DOI: 10.1609/hcomp.v10i1.21994
|View full text |Cite
|
Sign up to set email alerts
|

When More Data Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

Abstract: An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 47 publications
0
3
0
Order By: Relevance
“…This framework overall aims to capture the major considerations in operationalizing fairness that are quantifiable and enable benchmarking to some extent, as we believe that this helps practitioners decide how to make trade-offs between the pillars. We remark that while these are the classical categorizations in ML pipelines, there are still applications that use group data in ways that fall outside of these categories (for instance at the data collection step [10,39], we also propose one such method in the next section at the feature selection step). These methods should be considered, but our work focuses on bringing more structure and order to the majority of fairness intervention work in the highlighted categories [28].…”
Section: Model Performancementioning
confidence: 99%
“…This framework overall aims to capture the major considerations in operationalizing fairness that are quantifiable and enable benchmarking to some extent, as we believe that this helps practitioners decide how to make trade-offs between the pillars. We remark that while these are the classical categorizations in ML pipelines, there are still applications that use group data in ways that fall outside of these categories (for instance at the data collection step [10,39], we also propose one such method in the next section at the feature selection step). These methods should be considered, but our work focuses on bringing more structure and order to the majority of fairness intervention work in the highlighted categories [28].…”
Section: Model Performancementioning
confidence: 99%
“…Active learning-based data collection approaches assume that labels can be queried for each sample at a fixed cost [7,25,1,34]. Unlike these approaches, our framework uses constraints on false positives and assumes non-constant labeling costs that depend on the outcome and the classifier.…”
Section: Related Workmentioning
confidence: 99%
“…Other methods of data collection, e.g. using human annotators [25] or augmenting using third-party data [15] are often infeasible as they can only provide proxies for the true outcomes, which themselves can encode social biases [35].…”
Section: Related Workmentioning
confidence: 99%