Adaptive Sampling Strategies to Construct Equitable Training Datasets

Cai, William; Encarnacion, Ro; Chern, Bobbie; Corbett-Davies, Sam; Bogen, Miranda; Bergman, Stevie; Goel, Sharad

doi:10.1145/3531146.3533203

Cited by 14 publications

(7 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This framework overall aims to capture the major considerations in operationalizing fairness that are quantifiable and enable benchmarking to some extent, as we believe that this helps practitioners decide how to make trade-offs between the pillars. We remark that while these are the classical categorizations in ML pipelines, there are still applications that use group data in ways that fall outside of these categories (for instance at the data collection step [10,39], we also propose one such method in the next section at the feature selection step). These methods should be considered, but our work focuses on bringing more structure and order to the majority of fairness intervention work in the highlighted categories [28].…”

Section: Model Performancementioning

confidence: 99%

An Operational Perspective to Fairness Interventions: Where and How to Intervene

Hsu¹,

Chen²,

Han³

et al. 2023

Preprint

View full text Add to dashboard Cite

As AI-based decision systems proliferate, their successful operationalization requires balancing multiple desiderata: predictive performance, disparity across groups, safeguarding sensitive group attributes (e.g., race), and engineering cost. We present a holistic framework for evaluating and contextualizing fairness interventions with respect to the above desiderata. The two key points of practical consideration are where (pre-, in-, post-processing) and how (in what way the sensitive group data is used) the intervention is introduced. We demonstrate our framework using a thorough benchmarking study on predictive parity; we study close to 400 methodological variations across two major model types (XGBoost vs. Neural Net) and ten datasets. Methodological insights derived from our empirical study inform the practical design of ML workflow with fairness as a central concern. We find predictive parity is difficult to achieve without using group data, and despite requiring group data during model training (but not inference), distributionally robust methods provide significant Pareto improvement. Moreover, a plain XG-Boost model often Pareto-dominates neural networks with fairness interventions, highlighting the importance of model inductive bias. 1 The authors of the survey call these methods "bias mitigation" methods and semantics vary across institutions. For this work, we will refer to all algorithms with the end goal of inducing some measure of fairness to be "fairness interventions" or just "interventions."

show abstract

Section: Model Performancementioning

confidence: 99%

An Operational Perspective to Fairness Interventions: Where and How to Intervene

Hsu¹,

Chen²,

Han³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The lessons are complex, since the effects of including additional samples from a particular group on the model’s performance in that group depend on a large number of factors. Promising approaches for adaptively deciding which groups to sample from have been proposed, 48 , 49 attempting to automatically detect harder groups during dataset construction and then sampling preferentially from those. Such approaches will prove challenging to implement in medical practice, however.…”

Section: The Path Forward: Leveling Upmentioning

confidence: 99%

“…13 and Cai et al. 48 suggest analyzing the trajectory of performance improvements in different groups as more samples are added, to identify groups that benefit the most from additional samples. Similarly, if some groups benefit from group balancing, this may indicate the presence of estimator bias due to insufficient model expressivity.…”

Section: The Path Forward: Leveling Upmentioning

confidence: 99%

The path toward equal performance in medical machine learning

et al. 2023

View full text Add to dashboard Cite

“…Sampling is widely considered for dealing with the concerns of class imbalance and scalable analysis in machine learning [ 15 ]. Sampling strategies may have significant impacts on the performance given the fact that not all samples are equally important [ 23 , 24 ]. Previous studies in [ 15 , 25 , 26 , 27 , 28 ] considered utilizing sampling strategies (including stratified sampling) for mitigating the impact of the imbalance between malicious traffic (minority) vs. normal traffic (majority) in network intrusion/anomaly detection.…”

Section: Related Workmentioning

confidence: 99%

Leveraging History to Predict Infrequent Abnormal Transfers in Distributed Workflows

Shao

Sim

et al. 2023

Sensors

View full text Add to dashboard Cite

Scientific computing heavily relies on data shared by the community, especially in distributed data-intensive applications. This research focuses on predicting slow connections that create bottlenecks in distributed workflows. In this study, we analyze network traffic logs collected between January 2021 and August 2022 at the National Energy Research Scientific Computing Center (NERSC). Based on the observed patterns, we define a set of features primarily based on history for identifying low-performing data transfers. Typically, there are far fewer slow connections on well-maintained networks, which creates difficulty in learning to identify these abnormally slow connections from the normal ones. We devise several stratified sampling techniques to address the class-imbalance challenge and study how they affect the machine learning approaches. Our tests show that a relatively simple technique that undersamples the normal cases to balance the number of samples in two classes (normal and slow) is very effective for model training. This model predicts slow connections with an F1 score of 0.926.

show abstract

Adaptive Sampling Strategies to Construct Equitable Training Datasets

Cited by 14 publications

References 34 publications

An Operational Perspective to Fairness Interventions: Where and How to Intervene

An Operational Perspective to Fairness Interventions: Where and How to Intervene

The path toward equal performance in medical machine learning

Leveraging History to Predict Infrequent Abnormal Transfers in Distributed Workflows

Contact Info

Product

Resources

About