Quantifying Uncertainty in Neural Network Ensembles using U-Statistics

Schupbach, Jordan; Sheppard, John W.; Forrester, Tyler

doi:10.1109/ijcnn48605.2020.9206810

Cited by 6 publications

(3 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ensemble techniques combine the output of multiple models to improve predictive performance. The probability distribution of the point predictions of individual models can be used for uncertainty estimation (40)(41)(42)(43). This technique has been extended for out-of-distribution detection (44,45).…”

Section: Non-bayesian Methodsmentioning

confidence: 99%

Failure Detection in Deep Neural Networks for Medical Imaging

Ahmed

Dera

Hassan

et al. 2022

Front. Med. Technol.

View full text Add to dashboard Cite

Deep neural networks (DNNs) have started to find their role in the modern healthcare system. DNNs are being developed for diagnosis, prognosis, treatment planning, and outcome prediction for various diseases. With the increasing number of applications of DNNs in modern healthcare, their trustworthiness and reliability are becoming increasingly important. An essential aspect of trustworthiness is detecting the performance degradation and failure of deployed DNNs in medical settings. The softmax output values produced by DNNs are not a calibrated measure of model confidence. Softmax probability numbers are generally higher than the actual model confidence. The model confidence-accuracy gap further increases for wrong predictions and noisy inputs. We employ recently proposed Bayesian deep neural networks (BDNNs) to learn uncertainty in the model parameters. These models simultaneously output the predictions and a measure of confidence in the predictions. By testing these models under various noisy conditions, we show that the (learned) predictive confidence is well calibrated. We use these reliable confidence values for monitoring performance degradation and failure detection in DNNs. We propose two different failure detection methods. In the first method, we define a fixed threshold value based on the behavior of the predictive confidence with changing signal-to-noise ratio (SNR) of the test dataset. The second method learns the threshold value with a neural network. The proposed failure detection mechanisms seamlessly abstain from making decisions when the confidence of the BDNN is below the defined threshold and hold the decision for manual review. Resultantly, the accuracy of the models improves on the unseen test samples. We tested our proposed approach on three medical imaging datasets: PathMNIST, DermaMNIST, and OrganAMNIST, under different levels and types of noise. An increase in the noise of the test images increases the number of abstained samples. BDNNs are inherently robust and show more than 10% accuracy improvement with the proposed failure detection methods. The increased number of abstained samples or an abrupt increase in the predictive variance indicates model performance degradation or possible failure. Our work has the potential to improve the trustworthiness of DNNs and enhance user confidence in the model predictions.

show abstract

Section: Non-bayesian Methodsmentioning

confidence: 99%

Failure Detection in Deep Neural Networks for Medical Imaging

Ahmed

Dera

Hassan

et al. 2022

Front. Med. Technol.

View full text Add to dashboard Cite

show abstract

“…Inconsistent prediction confidence is related to model calibration [10], [11], out-of-distribution detection [12]- [22], and uncertainty estimation [23]- [26]. Model calibration in neural networks tries to calibrate their predictive probabilities so that they match their accuracy.…”

Section: Related Workmentioning

confidence: 99%

Towards Consistent Predictive Confidence through Fitted Ensembles

Kardan¹,

Sharma²,

Stanley³

2021

Preprint

View full text Add to dashboard Cite

Deep neural networks are behind many of the recent successes in machine learning applications. However, these models can produce overconfident decisions while encountering outof-distribution (OOD) examples or making a wrong prediction. This inconsistent predictive confidence limits the integration of independently-trained learning models into a larger system. This paper introduces separable concept learning framework to realistically measure the performance of classifiers in presence of OOD examples. In this setup, several instances of a classifier are trained on different parts of a partition of the set of classes. Later, the performance of the combination of these models is evaluated on a separate test set. Unlike current OOD detection techniques, this framework does not require auxiliary OOD datasets and does not separate classification from detection performance. Furthermore, we present a new strong baseline for more consistent predictive confidence in deep models, called fitted ensembles, where overconfident predictions are rectified by transformed versions of the original classification task. Fitted ensembles can naturally detect OOD examples without requiring auxiliary data by observing contradicting predictions among its components. Experiments on MNIST, SVHN, CIFAR-10/100, and ImageNet show fitted ensemble significantly outperform conventional ensembles on OOD examples and are possible to scale.

show abstract

“…The BNNs can be considered as an ensemble of an infinite number of neural networks [18]. Since the ensembles can provide one with an uncertainty metric [19], such as entropy [20] or variance of predictions across all of its nets, there is a perspective to use this type of algorithm for discovering OOD data. During BCI classifier application, including practical real-time scenarios, the BCI can be made to refrain from issuing a command when OOD is detected.…”

Section: Introductionmentioning

confidence: 99%

Bayesian Opportunities for Brain–Computer Interfaces: Enhancement of the Existing Classification Algorithms and Out-of-Domain Detection

Chetkin,

Shishkin,

Kozyrskiy

2023

Algorithms

View full text Add to dashboard Cite

Bayesian neural networks (BNNs) are effective tools for a variety of tasks that allow for the estimation of the uncertainty of the model. As BNNs use prior constraints on parameters, they are better regularized and less prone to overfitting, which is a serious issue for brain–computer interfaces (BCIs), where typically only small training datasets are available. Here, we tested, on the BCI Competition IV 2a motor imagery dataset, if the performance of the widely used, effective neural network classifiers EEGNet and Shallow ConvNet can be improved by turning them into BNNs. Accuracy indeed was higher, at least for a BNN based on Shallow ConvNet with two of three tested prior distributions. We also assessed if BNN-based uncertainty estimation could be used as a tool for out-of-domain (OOD) data detection. The OOD detection worked well only in certain participants; however, we expect that further development of the method may make it work sufficiently well for practical applications.

show abstract

Quantifying Uncertainty in Neural Network Ensembles using U-Statistics

Cited by 6 publications

References 31 publications

Failure Detection in Deep Neural Networks for Medical Imaging

Failure Detection in Deep Neural Networks for Medical Imaging

Towards Consistent Predictive Confidence through Fitted Ensembles

Bayesian Opportunities for Brain–Computer Interfaces: Enhancement of the Existing Classification Algorithms and Out-of-Domain Detection

Contact Info

Product

Resources

About