2018
DOI: 10.1155/2018/4512473
|View full text |Cite
|
Sign up to set email alerts
|

A Community Detection Approach to Cleaning Extremely Large Face Database

Abstract: Though it has been easier to build large face datasets by collecting images from the Internet in this Big Data era, the time-consuming manual annotation process prevents researchers from constructing larger ones, which makes the automatic cleaning of noisy labels highly desirable. However, identifying mislabeled faces by machine is quite challenging because the diversity of a person's face images that are captured wildly at all ages is extraordinarily rich. In view of this, we propose a graph-based cleaning me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 20 publications
0
7
0
Order By: Relevance
“…The noisy label is a common issue in face recognition. Several approaches are proposed for label cleaning [30]- [33], aiming at generating a clean dataset from noisy annotated labels. For example, a comprehensive study of noisy data is summarized and data cleaning approaches are investigated in [33].…”
Section: B Handling Noisy Datamentioning
confidence: 99%
See 1 more Smart Citation
“…The noisy label is a common issue in face recognition. Several approaches are proposed for label cleaning [30]- [33], aiming at generating a clean dataset from noisy annotated labels. For example, a comprehensive study of noisy data is summarized and data cleaning approaches are investigated in [33].…”
Section: B Handling Noisy Datamentioning
confidence: 99%
“…For example, a comprehensive study of noisy data is summarized and data cleaning approaches are investigated in [33]. A graph-based cleaning method that employs the community detection algorithm and deep CNN models to delete mislabeled images is proposed in [30]. Besides that, identifying and removing the wrong labeled face images is formulated as a quadratic programming problem in [31].…”
Section: B Handling Noisy Datamentioning
confidence: 99%
“…As the original version of this dataset has been shown to exhibit considerable inter-class noise, efforts have been made to automatically clean the dataset [21]. In the case of this version, after face detection and alignment, cleaning was performed by a semi-automatic refinement strategy.…”
Section: Training Datamentioning
confidence: 99%
“…The MS-Celeb-1M dataset, similar to other large-scale face datasets, includes a significant percentage of images mislabeled as a consequence of the automatic process used for collection. Thus, we filter the subset from [16] using the clean list of [28], and erasing all the faces detected in images with multiple detections. After this process, from which we obtained 102,870 images, we selected images belonging to 1,900 different subjects, a total set of 100K images.…”
Section: Face Recognition In the Wildmentioning
confidence: 99%