Sunil Thulasidasan scite author profile

Mixup [28] is a recently proposed method for training deep neural networks where additional samples are generated during training by convexly combining random pairs of images and their associated labels. While simple to implement, it has shown to be a surprisingly effective method of data augmentation for image classification; DNNs trained with mixup show noticeable gains in classification performance on a number of image classification benchmarks. In this work, we discuss a hitherto untouched aspect of mixup training -the calibration and predictive uncertainty of models trained with mixup. We find that DNNs trained with mixup are significantly better calibrated -i.e the predicted softmax scores are much better indicators of the actual likelihood of a correct prediction -than DNNs trained in the regular fashion. We conduct experiments on a number of image classification architectures and datasets -including large-scale datasets like ImageNet -and find this to be the case. Additionally, we find that merely mixing features does not result in the same calibration benefit and that the label smoothing in mixup training plays a significant role in improving calibration. Finally, we also observe that mixuptrained DNNs are less prone to over-confident predictions on out-of-distribution and random-noise data. We conclude that the typical overconfidence seen in neural networks, even on in-distribution data is likely a consequence of training with hard labels, suggesting that mixup training be employed for classification tasks where predictive uncertainty is a significant concern.1 Introduction: Overconfidence and Uncertainty in Deep Learning Machine learning algorithms are replacing or expected to increasingly replace humans in decisionmaking pipelines. With the deployment of AI-based systems in high risk fields such as medical diagnosis [18], autonomous vehicle control [16] and the legal sector [1], the major challenges of the upcoming era are thus going to be in issues of uncertainty and trust-worthiness of a classifier. With deep neural networks having established supremacy in many pattern recognition tasks, it is the predictive uncertainty of these types of classifiers that will be of increasing importance. The DNN must not only be accurate, but also indicate when it is likely to get the wrong answer. This allows the decision-making to be routed to a human or another more accurate, but possibly more expensive, classifier, with the assumption being that the additional cost incurred is greatly surpassed by the consequences of a wrong prediction.Preprint. Under review.

show abstract

Strong Edge Coloring for Channel Assignment in Wireless Radio Networks

Barrett

Istrate

Kumar

et al.

View full text Add to dashboard Cite

show abstract

On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Thulasidasan¹,

Chennupati²,

Bilmes³

et al. 2019

Preprint

View full text Add to dashboard Cite

Semantic Compression of TCP Traces

Istrate

Hansson

Thulasidasan

et al. 2006

View full text Add to dashboard Cite

show abstract

All-Pairs Shortest Path algorithms for planar graph for GPU-accelerated clusters

Djidjev

Chapuis

Andonov

et al. 2015

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

We present a new approach for solving the all-pairs shortest-path (APSP) problem for planar graphs that exploits the massive on-chip parallelism available in today's Graphics Processing Units (GPUs). We describe two new algorithms based on our approach. Both algorithms use Floyd-Warshall method, have near optimal complexity in terms of the total number of operations, while their matrix-based structure is regular enough to allow for efficient parallel implementation on the GPUs. By applying a divide-and-conquer approach, we are able to make use of multi-node GPU clusters, resulting in more than an order of magnitude speedup over fastest known Dijkstra-based GPU implementation and a twofold speedup over a parallel Dijkstra-based CPU implementation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.