2021
DOI: 10.1613/jair.1.12125
|View full text |Cite
|
Sign up to set email alerts
|

Confident Learning: Estimating Uncertainty in Dataset Labels

Abstract: Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. Whereas numerous studies have developed these principles independently, here, we combine t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
210
0
9

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 400 publications
(220 citation statements)
references
References 29 publications
1
210
0
9
Order By: Relevance
“…Nonetheless, the level of label noise which results from using our models is modest and is in fact below known error rates present in commonly-used computer vision datasets (e.g. ImageNet, which is estimated to have label noise as high as 10% [ 39 , 40 ]); as such, minimal impact on downstream computer vision performance can be expected.…”
Section: Discussionmentioning
confidence: 95%
“…Nonetheless, the level of label noise which results from using our models is modest and is in fact below known error rates present in commonly-used computer vision datasets (e.g. ImageNet, which is estimated to have label noise as high as 10% [ 39 , 40 ]); as such, minimal impact on downstream computer vision performance can be expected.…”
Section: Discussionmentioning
confidence: 95%
“…Confident learning [32] is related to outlier detection but with a different definition of outliers. The main idea in confident learning is to automatically identify samples with incorrect or noisy labels in ML data sets.…”
Section: Related Workmentioning
confidence: 99%
“…The main idea in confident learning is to automatically identify samples with incorrect or noisy labels in ML data sets. A model-agnostic confident learning approach, estimating the joint distribution between noisy and corrected labels, was implemented in Northcutt et al [32]. The identification of noisy labels depends on the out-of-sample predicted probabilities of ML models.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, it has been reported in [ 17 ] that it is effective to use the attention that introduces the concept of uncertainty using a Bayesian framework for disease risk prediction. Moreover, the literature [ 18 ] has reported the effectiveness of learning using data with ā€œnoisy labelsā€, which are labels with uncertain reliability, while considering the confidence in the labels. Therefore, in the ABN-based estimation, it can be more effective to reduce the influence of highlighted regions that are irrelevant to the actual distress regions.…”
Section: Introductionmentioning
confidence: 99%