Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness

Biswas, Sumon; Rajan, Hridesh

doi:10.1145/3368089.3409704

Cited by 68 publications

(71 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Overall, the most biased stages -TT7(LE), TT8(CT), TT4(CT), TT1(MV), GC8(SS), are improving performance. This stage-specific tradeoff is aligned with the overall performance-fairness tradeoff discussed in prior work [10,17,26], which can be compared quantitatively by the work of Hort et al [36]. Third, we found that some stages decrease the performance, either accuracy or f1 score.…”

Section: Fairness-performance Tradeoffsupporting

confidence: 84%

“…This comparison provides the necessary data to compute the four fairness metrics. Similar to [10,26], for each stage in a pipeline, we run this experiment ten times, and then report the mean and standard deviation of the metrics, to avoid inconsistency of the randomness in the ML classifiers. Finally, we followed the ML best practices so that noise is not introduced evaluating the fairness of preprocessing stages.…”

Section: Experiments Designmentioning

confidence: 99%

See 1 more Smart Citation

Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline

Biswas

Rajan

2021

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

Self Cite

View full text Add to dashboard Cite

In recent years, many incidents have been reported where machine learning models exhibited discrimination among people based on race, sex, age, etc. Research has been conducted to measure and mitigate unfairness in machine learning models. For a machine learning task, it is a common practice to build a pipeline that includes an ordered set of data preprocessing stages followed by a classifier. However, most of the research on fairness has considered a single classifier based prediction task. What are the fairness impacts of the preprocessing stages in machine learning pipeline? Furthermore, studies showed that often the root cause of unfairness is ingrained in the data itself, rather than the model. But no research has been conducted to measure the unfairness caused by a specific transformation made in the data preprocessing stage. In this paper, we introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources. Our results show that certain data transformers are causing the model to exhibit unfairness. We identified a number of fairness patterns in several categories of data transformers. Finally, we showed how the local fairness of a preprocessing stage composes in the global fairness of the pipeline. We used the fairness composition to choose appropriate downstream transformer that mitigates unfairness in the machine learning pipeline. CCS CONCEPTS• Software and its engineering → Software creation and management; • Computing methodologies → Machine learning.

show abstract

Section: Fairness-performance Tradeoffsupporting

confidence: 84%

Section: Experiments Designmentioning

confidence: 99%

Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline

Biswas

Rajan

2021

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

Self Cite

View full text Add to dashboard Cite

show abstract

“…Harrison et al [29] studied the perceived fairness of humans in regards to ML models. Biswas and Hridesh [8] studied the fairness of ML models on crowd-sourced platforms. Finkelstein et al [25] explored fairness in requirement analysis, and showed different needs among customers.…”

Section: Software Engineering For ML Fairnessmentioning

confidence: 99%

Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods

Hort

Zhang

Sarro

et al. 2021

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

The increasingly wide uptake of Machine Learning (ML) has raised the significance of the problem of tackling bias (i.e., unfairness), making it a primary software engineering concern. In this paper, we introduce Fairea, a model behaviour mutation approach to benchmarking ML bias mitigation methods. We also report on a largescale empirical study to test the effectiveness of 12 widely-studied bias mitigation methods. Our results reveal that, surprisingly, bias mitigation methods have a poor effectiveness in 49% of the cases. In particular, 15% of the mitigation cases have worse fairness-accuracy trade-offs than the baseline established by Fairea; 34% of the cases have a decrease in accuracy and an increase in bias.Fairea has been made publicly available for software engineers and researchers to evaluate their bias mitigation methods. CCS CONCEPTS• Software and its engineering → Software creation and management; Extra-functional properties.

show abstract

“…It would also be interesting to go beyond accuracy bugs to detect and localize more non-functional bugs, e.g. fairness bugs [77].…”

Section: II Co N C L U S Io N S a N D Fu T U R E W O R Kmentioning

confidence: 99%

DeepLocalize: Fault Localization for Deep Neural Networks

Wardat

Rajan

2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

Deep neural networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don t support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box.To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Kerns library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Kerns detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.

show abstract

Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness

Cited by 68 publications

References 30 publications

Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline

Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline

Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods

DeepLocalize: Fault Localization for Deep Neural Networks

Contact Info

Product

Resources

About