REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets

Wang, Angelina; Narayanan, Arvind; Russakovsky, Olga

doi:10.1007/978-3-030-58580-8_43

Cited by 69 publications

(67 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, if all the images labelled as 'CEO' are of white men, the model will not associate women or men of colour with that particular term. Therefore, having diversity in training datasets is imperative to countering and mitigating implicit biases in machine learning models [3,13]. Building on this work we approach the issue of bias in facial recognition datasets from the perspective of geography and devise measures that evaluate and accordingly increase diversity.…”

Section: Related Workmentioning

confidence: 99%

“…As a result, they have a high representation of attributes associated with Western societies such as faces with lighter skin tone and western clothing, leaving the datasets heavily biased with an underrepresentation of non-western regions such as Africa or West Asia. Wang et al [13] studied the geographical distribution of images in OpenImages and ImageNet and found them to be Europe and North America centric, with the USA being highly over-represented and Africa being severely under-represented. When these datasets are used to train deep learning models such biases can be propagated within the learning models and amplified within AI systems [3].…”

Section: Biases In Visual Datasetsmentioning

confidence: 99%

“…Many of the biases (such as measurement or behavioral bias) cannot be used to evaluate visual datasets due to its nature. Celis et al [3] and Wang et al [13] analysed a few causes of bias in visual datasets arising due to historical and cultural reasons. Understanding the range of biases that may be embedded in a dataset is key to identifying potential sources of bias and mitigating them.…”

Section: Biases In Visual Datasetsmentioning

confidence: 99%

“…Biases in computer vision often originate within data used to train deep neural networks. Many of the datasets used for training have been shown to exhibit a 'western centric' bias [5,13]. These biases can be learned and propagate throughout the machine learning pipeline, leading to the creation of biased computer vision models [3].…”

Section: Introductionmentioning

confidence: 99%

“…Auditing of social bias in visual datasets for faces has relied primarily on two main parameters: race (focusing on skin tone), and gender [1,3,5,13]. We approach this issue from a different perspective -geography.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Dataset Diversity

Mandal

Leavy

Little

2021

Proceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing

View full text Add to dashboard Cite

Many popular visual datasets used to train deep neural networks for computer vision applications, especially for facial analytics, are created by retrieving images from the internet. Search engines are often used to perform this task. However, due to localisation and personalisation of search results by the search engines along with the image indexing method used by these search engines, the resultant images overrepresent the demographics of the region from where they were queried from. As most of the visual datasets are created in western countries, they tend to have a western centric bias and when these datasets are used to train deep neural networks, they tend to inherit these biases. Researchers studying the issue of bias in visual datasets have focused on the racial aspect of these biases. We approach this from a geographical perspective. In this paper, we 1) study how linguistic variations in search queries and geographical variations in the querying region affect the social and cultural aspects of retrieved images focusing on facial analytics, 2) explore how geographical bias in image search and retrieval can cause racial, cultural and stereotypical bias in visual datasets and 3) propose methods to mitigate such biases. CCS CONCEPTS• Computing methodologies → Image and video acquisition.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Biases In Visual Datasetsmentioning

confidence: 99%

Section: Biases In Visual Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Dataset Diversity

Mandal

Leavy

Little

2021

Proceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing

View full text Add to dashboard Cite

show abstract

Conditional Adversarial Debiasing: Towards Learning Unbiased Classifiers from Biased Data

Reimers

Bodesheim

Runge

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

ODIN: Pluggable Meta-annotations and Metrics for the Diagnosis of Classification and Localization

Torres

Milani

Fraternali

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Machine Learning (ML) tasks, especially Computer Vision (CV) ones, have greatly progressed after the introduction of Deep Neural Networks. Analyzing the performance of deep models is an open issue, addressed with techniques that inspect the response of inner network layers to given inputs. A complementary approach relies on ad-hoc metadata added to the input and used to factor the performance into indicators sensitive to specific facets of the data. We present ODIN an open source diagnosis framework for generic ML classification tasks and for CV object detection and instance segmentation tasks that lets developers add meta-annotations to their data sets, compute performance metrics split by meta-annotation values, and visualize diagnosis reports. ODIN is agnostic to the training platform and input formats and can be extended with application-and domain-specific meta-annotations and metrics with almost no coding. It integrates a rapid annotation tool for classification and object detection data sets. In this paper, we exemplify ODIN through CV tasks, but the tool can be used for generic ML classification.

show abstract

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets

Cited by 69 publications

References 44 publications

Dataset Diversity

Dataset Diversity

Conditional Adversarial Debiasing: Towards Learning Unbiased Classifiers from Biased Data

ODIN: Pluggable Meta-annotations and Metrics for the Diagnosis of Classification and Localization

Contact Info

Product

Resources

About