2022
DOI: 10.1609/aaai.v36i10.21364
|View full text |Cite
|
Sign up to set email alerts
|

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Abstract: Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer models is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertain… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…A model is perfectly calibrated if for some data distribution D, for all input pairs (x, y) ∈ D, if a model predicts p i = 0.8, then 80% such pairs have y i as a ground truth label. Work on calibration can be broadly categorized into post-hoc methods which calibrate models after training ( [12], [13], [14]), regularization methods during training ( [15], [16], [17], [18]), data augmentation methods ( [19], [20]), and alleviating miscalibration by injecting randomness with uncertainty estimation ( [21], [22], [23]). However, even a perfectly-calibrated model can make aberrant predictions.…”
Section: Related Workmentioning
confidence: 99%
“…A model is perfectly calibrated if for some data distribution D, for all input pairs (x, y) ∈ D, if a model predicts p i = 0.8, then 80% such pairs have y i as a ground truth label. Work on calibration can be broadly categorized into post-hoc methods which calibrate models after training ( [12], [13], [14]), regularization methods during training ( [15], [16], [17], [18]), data augmentation methods ( [19], [20]), and alleviating miscalibration by injecting randomness with uncertainty estimation ( [21], [22], [23]). However, even a perfectly-calibrated model can make aberrant predictions.…”
Section: Related Workmentioning
confidence: 99%
“…This line of work aims to alleviate model miscalibration by injecting randomness.The popular methods are (1) Bayesian neural networks (Blundell et al 2015;Fortunato, Blundell, and Vinyals 2017), ( 2) ensembles (Lakshminarayanan, Pritzel, and Blundell 2017), (3) Monte Carlo(MC) dropout (Gal and Ghahramani 2016) and ( 4) Gumbel-softmax (Jang, Gu, and Poole 2017) based approaches (Wang, Lawrence, and Niepert 2021;Pei, Wang, and Szarvas 2022). The former three sub-categorgies have been discussed in recent surveys (Mena, Pujol, and Vitria 2021;Gawlikowski et al 2021) ).…”
Section: Uncertainty Estimationmentioning
confidence: 99%