Training data debugging for the fairness of machine learning software

Li, Yanhui; Meng, Linghan; Chen, Lin; Yu, Li; Wu, Di; Zhou, Yuming; Xu, Baowen

doi:10.1145/3510003.3510091

Cited by 27 publications

(13 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, data determine the decision logic of ML software to a large extent [17], and data bias is considered a main root cause of ML software bias [48]. Data testing aims to detect different types of data bias, including checking whether the labels of training data are biased (label bias) [35], whether the distribution of training data implies an unexpected correlation between the sensitive attribute and the outcome label (selection bias) [49], whether the features of training data contain bias (feature bias) [50].…”

Section: Fairness Testing Componentsmentioning

confidence: 99%

“…For example, for demographic parity, researchers calculate the favorable rate among different demographic groups and detect fairness violations by comparing these rates. If the rate difference, called Statistical Parity Difference (SPD) in the software fairness literature [35], [38], [48], [50], [118], is beyond a threshold, the software under test is identified as containing fairness bugs.…”

Section: Statistical Measurements As Test Oraclesmentioning

confidence: 99%

“…(1) Detection of feature bias. Feature bias occurs when some features in the training data are highly related to the sensitive attribute, and these correlated features can thus become the root cause of software unfairness [50]. Zhang et al [118] explored how the feature set influences ML software fairness.…”

Section: Data Testingmentioning

confidence: 99%

“…To tackle this problem, Li et al [50] aimed to identify all the biased features that are correlated with sensitive attributes. Specifically, they applied linear regression to analyze the association between each feature and sensitive attributes, and identified those features that may thereby induce bias.…”

Section: Data Testingmentioning

confidence: 99%

“…Usage scenario Description Link FairTest [140] General ML software Analyzing associations between software outcomes and sensitive attributes [196] Themis [55] General ML software Black-box random discriminatory instance generation [197] Aequitas [78] General ML software Black-box search-based discriminatory instance generation [198] ExpGA [77] General ML software Black-box search-based discriminatory instance generation [199] fairCheck [89] General ML software Verification-based discriminatory instance generation [200] MLCheck [88] General ML software Verification-based discriminatory instance generation [201] LTDD [50] General ML software Detecting which data features and which parts of them are biased [202] Fair-SMOTE [48] General ML software Detecting biased data labels and data distributions [203] xFAIR [144] General ML software Extrapolation of correlations among data features that might cause bias [204] Fairway [35] General ML software Detecting biased data labels and optimal hyper-parameters for ML fairness [205] Parfait-ML [46] General ML software Searching for hyper-parameters optimal to ML software fairness [206] Fairea [38] General ML software Testing fairness repair algorithms [207] IBM AIF360 [161] General ML software Examining and mitigating discrimination and bias in ML software [119] scikit-fairness [208] General ML software Examining and mitigating discrimination and bias in ML software [208] LiFT [209] General ML software Examining and mitigating discrimination and bias in ML software [210] SageMaker Clarify [211] General ML software Measuring bias that occurs in each stage of the ML life cycle [212] FairVis [213] General ML software Visual analytics for discovering intersectional bias in ML software [214] FairRepair [155] Tree-based classifiers Detecting paths responsible for unfairness in tree-based classifiers [215] ADF…”

Section: Tool [Ref]mentioning

confidence: 99%

See 4 more Smart Citations

Fairness Testing: A Comprehensive Survey and Analysis of Trends

Chen¹,

Zhang²,

Hort³

et al. 2022

Preprint

View full text Add to dashboard Cite

Software systems are vulnerable to fairness bugs and frequently exhibit unfair behaviors, making software fairness an increasingly important concern for software engineers. Research has focused on helping software engineers to detect fairness bugs automatically. This paper provides a comprehensive survey of existing research on fairness testing. We collect 113 papers and organise them based on the testing workflow (i.e., the testing activities) and the testing components (i.e., where to find fairness bugs) for conducting fairness testing. We also analyze the research focus, trends, promising directions, as well as widely-adopted datasets and open source tools for fairness testing.

show abstract