2019
DOI: 10.48550/arxiv.1910.11385
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Calibration tests in multi-class classification: A unifying framework

Abstract: In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empiricall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…As calibration fidelity measurements using reliability diagrams are originally designed for binary classifiers, there have been numerous proposals for their extensions to multi-class classifiers [37][38][39]. We consider two relatively simple methods; the first is a standard used in Guo et al [37], where only the predicted probability for the most confident prediction of each sample is used to plot the reliability diagram.…”
Section: Evaluating Uncertainty Quantification Methodsmentioning
confidence: 99%
“…As calibration fidelity measurements using reliability diagrams are originally designed for binary classifiers, there have been numerous proposals for their extensions to multi-class classifiers [37][38][39]. We consider two relatively simple methods; the first is a standard used in Guo et al [37], where only the predicted probability for the most confident prediction of each sample is used to plot the reliability diagram.…”
Section: Evaluating Uncertainty Quantification Methodsmentioning
confidence: 99%
“…Our calibration technique is independent of the binning scheme/bins. This is important, because as [53] and [26] have also highlighted, binning scheme leads to underestimated calibration errors. We name our loss function, Multiclass Difference of Confidence and Accuracy (MDCA), and apply it for each mini-batch during training.…”
Section: Proposed Auxiliary Loss: Mdcamentioning
confidence: 99%
“…the support of p), the necessary sample complexity grows linearly in the support of p. Of course, for a trivial predictors that map all inputs x to the same prediction q 0 (i.e. p(x) = q 0 , ∀x ∈ X ) distribution calibration is easy to certify (Widmann et al, 2019), but such predictors have no practical use.…”
Section: Decision Calibration Generalizes Existing Notions Of Calibra...mentioning
confidence: 99%
“…Weaker notions such as confidence (weak) calibration (Platt et al, 1999;Guo et al, 2017), class-wise calibration or average calibration average calibration (Kuleshov et al, 2018) are more commonly used in practice. To unify these notions, (Widmann et al, 2019) proposes F-calibration but lacks detailed guidance on which notions to use.…”
Section: Related Workmentioning
confidence: 99%