Confident Learning: Estimating Uncertainty in Dataset Labels

Northcutt, Curtis G.; Jiang, Lu; Chuang, Isaac L.

doi:10.1613/jair.1.12125

Cited by 400 publications

(220 citation statements)

References 29 publications

Supporting

Mentioning

210

Contrasting

Unclassified

Order By: Relevance

“…Nonetheless, the level of label noise which results from using our models is modest and is in fact below known error rates present in commonly-used computer vision datasets (e.g. ImageNet, which is estimated to have label noise as high as 10% [ 39 , 40 ]); as such, minimal impact on downstream computer vision performance can be expected.…”

Section: Discussionmentioning

confidence: 95%

Deep learning to automate the labelling of head MRI datasets for computer vision applications

et al. 2021

View full text Add to dashboard Cite

Objectives The purpose of this study was to build a deep learning model to derive labels from neuroradiology reports and assign these to the corresponding examinations, overcoming a bottleneck to computer vision model development. Methods Reference-standard labels were generated by a team of neuroradiologists for model training and evaluation. Three thousand examinations were labelled for the presence or absence of any abnormality by manually scrutinising the corresponding radiology reports (‘reference-standard report labels’); a subset of these examinations (n = 250) were assigned ‘reference-standard image labels’ by interrogating the actual images. Separately, 2000 reports were labelled for the presence or absence of 7 specialised categories of abnormality (acute stroke, mass, atrophy, vascular abnormality, small vessel disease, white matter inflammation, encephalomalacia), with a subset of these examinations (n = 700) also assigned reference-standard image labels. A deep learning model was trained using labelled reports and validated in two ways: comparing predicted labels to (i) reference-standard report labels and (ii) reference-standard image labels. The area under the receiver operating characteristic curve (AUC-ROC) was used to quantify model performance. Accuracy, sensitivity, specificity, and F1 score were also calculated. Results Accurate classification (AUC-ROC > 0.95) was achieved for all categories when tested against reference-standard report labels. A drop in performance (ΔAUC-ROC > 0.02) was seen for three categories (atrophy, encephalomalacia, vascular) when tested against reference-standard image labels, highlighting discrepancies in the original reports. Once trained, the model assigned labels to 121,556 examinations in under 30 min. Conclusions Our model accurately classifies head MRI examinations, enabling automated dataset labelling for downstream computer vision applications. Key Points • Deep learning is poised to revolutionise image recognition tasks in radiology; however, a barrier to clinical adoption is the difficulty of obtaining large labelled datasets for model training. • We demonstrate a deep learning model which can derive labels from neuroradiology reports and assign these to the corresponding examinations at scale, facilitating the development of downstream computer vision models. • We rigorously tested our model by comparing labels predicted on the basis of neuroradiology reports with two sets of reference-standard labels: (1) labels derived by manually scrutinising each radiology report and (2) labels derived by interrogating the actual images.

show abstract

Section: Discussionmentioning

confidence: 95%

Deep learning to automate the labelling of head MRI datasets for computer vision applications

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Confident learning [32] is related to outlier detection but with a different definition of outliers. The main idea in confident learning is to automatically identify samples with incorrect or noisy labels in ML data sets.…”

Section: Related Workmentioning

confidence: 99%

“…The main idea in confident learning is to automatically identify samples with incorrect or noisy labels in ML data sets. A model-agnostic confident learning approach, estimating the joint distribution between noisy and corrected labels, was implemented in Northcutt et al [32]. The identification of noisy labels depends on the out-of-sample predicted probabilities of ML models.…”

Section: Related Workmentioning

confidence: 99%

Data analysis with Shapley values for automatic subject selection in Alzheimer’s disease data sets using interpretable machine learning

Bloch

Friedrich

2021

Alz Res Therapy

View full text Add to dashboard Cite

Background For the recruitment and monitoring of subjects for therapy studies, it is important to predict whether mild cognitive impaired (MCI) subjects will prospectively develop Alzheimer’s disease (AD). Machine learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to high variability in disease patterns. Further variability originates from multicentric study designs, varying acquisition protocols, and errors in the preprocessing of magnetic resonance imaging (MRI) scans. The high variability makes the differentiation between signal and noise difficult and may lead to overfitting. This article examines whether an automatic and fair data valuation method based on Shapley values can identify the most informative subjects to improve ML classification. Methods An ML workflow was developed and trained for a subset of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workflow included volumetric MRI feature extraction, feature selection, sample selection using Data Shapley, random forest (RF), and eXtreme Gradient Boosting (XGBoost) for model training as well as Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. Results The RF models, which excluded 134 of the 467 training subjects based on their RF Data Shapley values, outperformed the base models that reached a mean accuracy of 62.64% by 5.76% (3.61 percentage points) for the independent ADNI test set. The XGBoost base models reached a mean accuracy of 60.00% for the AIBL data set. The exclusion of those 133 subjects with the smallest RF Data Shapley values could improve the classification accuracy by 2.98% (1.79 percentage points). The cutoff values were calculated using an independent validation set. Conclusion The Data Shapley method was able to improve the mean accuracies for the test sets. The most informative subjects were associated with the number of ApolipoproteinE ε4 (ApoE ε4) alleles, cognitive test results, and volumetric MRI measurements.

show abstract

“…Moreover, it has been reported in [ 17 ] that it is effective to use the attention that introduces the concept of uncertainty using a Bayesian framework for disease risk prediction. Moreover, the literature [ 18 ] has reported the effectiveness of learning using data with “noisy labels”, which are labels with uncertain reliability, while considering the confidence in the labels. Therefore, in the ABN-based estimation, it can be more effective to reduce the influence of highlighted regions that are irrelevant to the actual distress regions.…”

Section: Introductionmentioning

confidence: 99%

Deterioration Level Estimation Based on Convolutional Neural Network Using Confidence-Aware Attention Mechanism for Infrastructure Inspection

Ogawa

Maeda

Ogawa

et al. 2022

Sensors

View full text Add to dashboard Cite

This paper presents deterioration level estimation based on convolutional neural networks using a confidence-aware attention mechanism for infrastructure inspection. Spatial attention mechanisms try to highlight the important regions in feature maps for estimation by using an attention map. The attention mechanism using an effective attention map can improve feature maps. However, the conventional attention mechanisms have a problem as they fail to highlight important regions for estimation when an ineffective attention map is mistakenly used. To solve the above problem, this paper introduces the confidence-aware attention mechanism that reduces the effect of ineffective attention maps by considering the confidence corresponding to the attention map. The confidence is calculated from the entropy of the estimated class probabilities when generating the attention map. Because the proposed method can effectively utilize the attention map by considering the confidence, it can focus more on the important regions in the final estimation. This is the most significant contribution of this paper. The experimental results using images from actual infrastructure inspections confirm the performance improvement of the proposed method in estimating the deterioration level.

show abstract

Confident Learning: Estimating Uncertainty in Dataset Labels

Cited by 400 publications

References 29 publications

Deep learning to automate the labelling of head MRI datasets for computer vision applications

Deep learning to automate the labelling of head MRI datasets for computer vision applications

Data analysis with Shapley values for automatic subject selection in Alzheimer’s disease data sets using interpretable machine learning

Deterioration Level Estimation Based on Convolutional Neural Network Using Confidence-Aware Attention Mechanism for Infrastructure Inspection

Contact Info

Product

Resources

About