A Community Detection Approach to Cleaning Extremely Large Face Database

Jin, Chi; Jin, Ruochun; Chen, Kai; Dou, Yong

doi:10.1155/2018/4512473

Cited by 18 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The noisy label is a common issue in face recognition. Several approaches are proposed for label cleaning [30]- [33], aiming at generating a clean dataset from noisy annotated labels. For example, a comprehensive study of noisy data is summarized and data cleaning approaches are investigated in [33].…”

Section: B Handling Noisy Datamentioning

confidence: 99%

“…For example, a comprehensive study of noisy data is summarized and data cleaning approaches are investigated in [33]. A graph-based cleaning method that employs the community detection algorithm and deep CNN models to delete mislabeled images is proposed in [30]. Besides that, identifying and removing the wrong labeled face images is formulated as a quadratic programming problem in [31].…”

Section: B Handling Noisy Datamentioning

confidence: 99%

See 1 more Smart Citation

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

Wang

Chen²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

To achieve good performance in face recognition, a large scale training dataset is usually required. A simple yet effective way to improve the recognition performance is to use a dataset as large as possible by combining multiple datasets in the training. However, it is problematic and troublesome to naively combine different datasets due to two major issues. First, the same person can possibly appear in different datasets, leading to an identity overlapping issue between different datasets. Naively treating the same person as different classes in different datasets during training will affect back-propagation and generate nonrepresentative embeddings. On the other hand, manually cleaning labels may take formidable human efforts, especially when there are millions of images and thousands of identities. Second, different datasets are collected in different situations and thus will lead to different domain distributions. Naively combining datasets will make it difficult to learn domain invariant embeddings across different datasets. In this paper, we propose DAIL: Dataset-Aware and Invariant Learning to resolve the above-mentioned issues. To solve the first issue of identity overlapping, we propose a dataset-aware loss for multi-dataset training by reducing the penalty when the same person appears in multiple datasets. This can be readily achieved with a modified softmax loss with a dataset-aware term. To solve the second issue, domain adaptation with gradient reversal layers is employed for dataset invariant learning. The proposed approach not only achieves the stateof-the-art results on several commonly used face recognition validation sets, including LFW, CFP-FP, and AgeDB-30, but also shows great benefit for practical use.

show abstract

Section: B Handling Noisy Datamentioning

confidence: 99%

Section: B Handling Noisy Datamentioning

confidence: 99%

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

Wang

Chen²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…As the original version of this dataset has been shown to exhibit considerable inter-class noise, efforts have been made to automatically clean the dataset [21]. In the case of this version, after face detection and alignment, cleaning was performed by a semi-automatic refinement strategy.…”

Section: Training Datamentioning

confidence: 99%

Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

Dulhanty

Wong

2020

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

View full text Add to dashboard Cite

Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals' faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual's face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dualuse technologies that can enable mass surveillance. However, the impact of an individual's inclusion in training data on a derived system's ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection. CCS CONCEPTS• Security and privacy → Social aspects of security and privacy; • Computing methodologies → Visual content-based indexing and retrieval; • Computer systems organization → Neural networks; • Social and professional topics → Surveillance.

show abstract

“…The MS-Celeb-1M dataset, similar to other large-scale face datasets, includes a significant percentage of images mislabeled as a consequence of the automatic process used for collection. Thus, we filter the subset from [16] using the clean list of [28], and erasing all the faces detected in images with multiple detections. After this process, from which we obtained 102,870 images, we selected images belonging to 1,900 different subjects, a total set of 100K images.…”

Section: Face Recognition In the Wildmentioning

confidence: 99%

Facial Expressions as a Vulnerability in Face Recognition

Peña

Serna

Morales

et al. 2020

Preprint

View full text Add to dashboard Cite

This work explores facial expression bias as a security vulnerability of face recognition systems. Face recognition technology has experienced great advances during the last decades. However, despite the great performance achieved by state of the art face recognition systems, the algorithms are still sensitive to a large range of covariates. This work presents a comprehensive analysis of how facial expression bias impacts the performance of face recognition technologies. Our study analyzes: i) facial expression biases in the most popular face recognition databases; and ii) the impact of facial expression in face recognition performances. Our experimental framework includes four face detectors, three face recognition models, and four different databases. Our results demonstrate a huge facial expression bias in the most widely used databases, as well as a related impact of face expression in the performance of state-of-the-art algorithms. This work opens the door to new research lines focused on mitigating the observed vulnerability.

show abstract

A Community Detection Approach to Cleaning Extremely Large Face Database

Cited by 18 publications

References 20 publications

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

Facial Expressions as a Vulnerability in Face Recognition

Contact Info

Product

Resources

About