2022
DOI: 10.3390/app12094619
|View full text |Cite
|
Sign up to set email alerts
|

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Abstract: The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…In addition, considering the growing concerns around data privacy and fairness, it is imperative to thoroughly explore the ethical implications of synthetic data generation techniques. The development of algorithms based on machine learning techniques must take into account concepts such as data bias and fairness [44]. While the scientific literature proposes numerous techniques to detect and evaluate these problems in real datasets, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms [45].…”
Section: Discussionmentioning
confidence: 99%
“…In addition, considering the growing concerns around data privacy and fairness, it is imperative to thoroughly explore the ethical implications of synthetic data generation techniques. The development of algorithms based on machine learning techniques must take into account concepts such as data bias and fairness [44]. While the scientific literature proposes numerous techniques to detect and evaluate these problems in real datasets, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms [45].…”
Section: Discussionmentioning
confidence: 99%
“…The use of balanced synthetic datasets created by GANs to augment classification training has demonstrated the benefits for reducing disparate impact due to minoritized subgroup imbalance [112][113][114]. [115] models bias using a probabilistic network exploiting structural equation modeling as the preprocessing to generate a fairness-aware synthetic dataset. Authors in [116] leverage GAN as the pre-processing for fair data generation that ensures the generated data is discrimination free while maintaining high data utility.…”
Section: Fairnessmentioning
confidence: 99%
“…Biased synthetic data, when it contains demographic biases, can exacerbate downstream equity concerns. Careful study, design, 16 and testing are needed to determine if synthetic data are helpful in mitigating bias for each individual task and does not introduce new biases.…”
Section: Bias Category I: Data Collectionmentioning
confidence: 99%