Jeff Bilmes scite author profile

Mixup [28] is a recently proposed method for training deep neural networks where additional samples are generated during training by convexly combining random pairs of images and their associated labels. While simple to implement, it has shown to be a surprisingly effective method of data augmentation for image classification; DNNs trained with mixup show noticeable gains in classification performance on a number of image classification benchmarks. In this work, we discuss a hitherto untouched aspect of mixup training -the calibration and predictive uncertainty of models trained with mixup. We find that DNNs trained with mixup are significantly better calibrated -i.e the predicted softmax scores are much better indicators of the actual likelihood of a correct prediction -than DNNs trained in the regular fashion. We conduct experiments on a number of image classification architectures and datasets -including large-scale datasets like ImageNet -and find this to be the case. Additionally, we find that merely mixing features does not result in the same calibration benefit and that the label smoothing in mixup training plays a significant role in improving calibration. Finally, we also observe that mixuptrained DNNs are less prone to over-confident predictions on out-of-distribution and random-noise data. We conclude that the typical overconfidence seen in neural networks, even on in-distribution data is likely a consequence of training with hard labels, suggesting that mixup training be employed for classification tasks where predictive uncertainty is a significant concern.1 Introduction: Overconfidence and Uncertainty in Deep Learning Machine learning algorithms are replacing or expected to increasingly replace humans in decisionmaking pipelines. With the deployment of AI-based systems in high risk fields such as medical diagnosis [18], autonomous vehicle control [16] and the legal sector [1], the major challenges of the upcoming era are thus going to be in issues of uncertainty and trust-worthiness of a classifier. With deep neural networks having established supremacy in many pattern recognition tasks, it is the predictive uncertainty of these types of classifiers that will be of increasing importance. The DNN must not only be accurate, but also indicate when it is likely to get the wrong answer. This allows the decision-making to be routed to a human or another more accurate, but possibly more expensive, classifier, with the assumption being that the additional cost incurred is greatly surpassed by the consequences of a wrong prediction.Preprint. Under review.

show abstract

Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks

Reynolds

et al. 2008

View full text Add to dashboard Cite

Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.

show abstract

Unsupervised pattern discovery in human chromatin structure through genomic segmentation

et al. 2013

View full text Add to dashboard Cite

We applied a dynamic Bayesian network method that identifies joint patterns from multiple functional genomics experiments to ChIP-seq histone modification and transcription factor data, and DNaseI-seq and FAIRE-seq open chromatin readouts from the human cell line K562. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, CTCF elements, and repressed regions. Software and genome browser tracks are at Author contributions M.M.H., W.S.N., and J.A.B. conceived the project; M.M.H., W.S.N, and Z.W. designed computational and biological experiments. M.M.H., J.A.B., O.J.B., and J.W. developed software used in this work; M.M.H., O.J.B., and J.W. performed computational experiments and analyzed data; M.M.

show abstract

Unsupervised learning of acoustic features via deep canonical correlation analysis

et al. 2015

View full text Add to dashboard Cite

It has been previously shown that, when both acoustic and articulatory training data are available, it is possible to improve phonetic recognition accuracy by learning acoustic features from this multiview data with canonical correlation analysis (CCA). In contrast with previous work based on linear or kernel CCA, we use the recently proposed deep CCA, where the functional form of the feature mapping is a deep neural network. We apply the approach on a speakerindependent phonetic recognition task using data from the University of Wisconsin X-ray Microbeam Database. Using a tandem-style recognizer on this task, deep CCA features improve over earlier multiview approaches as well as over articulatory inversion and typical neural network-based tandem features. We also present a new stochastic training approach for deep CCA, which produces both faster training and better-performing features.Index Terms-multi-view learning, neural networks, deep canonical correlation analysis, XRMB, articulatory measurements INTRODUCTIONModern speech recognizers often use deep neural networks (DNNs) trained to predict the posterior probabilities of phonetic states [1]. In the two most common approaches, either (1) the DNN outputs are scaled by the state priors and used as an observation model in a hidden Markov model (HMM)-based recognizer (the hybrid approach [2]) or (2) the outputs of some layer of the network (possibly a narrow "bottleneck" layer or the final layer) are post-processed and used as acoustic features in an HMM system with a Gaussian mixture model (GMM) observation distribution (the tandem approach [3]). Working within the tandem approach, we investigate whether we can learn better DNN-based acoustic features via unsupervised learning using an external set of unlabeled multi-view data, in our case simultaneously recorded audio and articulatory measurements.The idea of feature learning using multi-view data has been explored previously using canonical correlation analysis (CCA) [4] and its nonlinear extension kernel CCA (KCCA) [5,6]. Here we propose to use the recently devloped deep CCA (DCCA) approach, which differs from linear/kernel CCA in that the feature mapping is implemented with a DNN rather than a linear/kernel function. Considering the earlier successes of CCA/KCCA, and the general success of DNNs for speech tasks, it is a natural question whether multi-view feature learning can benefit from the more flexible functional form of a DNN. We investigate this question, using data from the University of Wisconsin X-ray Microbeam Database (XRMB) [7], on speakerindependent phonetic recognition in a setting where no articulatory data is available for the recognizer training speakers. We find that DCCA indeed improves over previous CCA-based features, as well

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeff Bilmes

Unsupervised pattern discovery in human chromatin structure through genomic segmentation

On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks

Unsupervised pattern discovery in human chromatin structure through genomic segmentation

Unsupervised learning of acoustic features via deep canonical correlation analysis

Contact Info

Product

Resources

About