2022
DOI: 10.1007/978-3-031-20868-3_43
|View full text |Cite
|
Sign up to set email alerts
|

More Than Accuracy: An Empirical Study of Consistency Between Performance and Interpretability

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…Despite the impact of this quality dimension on model performance and actual reliability, still, relatively low attention has been paid to its comprehensive assessment, which is usually accomplished by means of a number of alternative metrics (chiefly among them, the Brier score [6] and the Expected Calibration Error (ECE) [16]), that, despite their popularity, present several shortcomings [19,21]. These mainly concern their interpretability [11,23] (in terms of nonlinear scales or measurand factors, as for the Brier score), consistency [19,21] (undermining comparisons and benchmarking) and comprehensiveness [3] (when they do not account for local calibration, that is for levels of calibration in the surroundings of relevant portions of the probability space or bins).…”
Section: Introductionmentioning
confidence: 99%
“…Despite the impact of this quality dimension on model performance and actual reliability, still, relatively low attention has been paid to its comprehensive assessment, which is usually accomplished by means of a number of alternative metrics (chiefly among them, the Brier score [6] and the Expected Calibration Error (ECE) [16]), that, despite their popularity, present several shortcomings [19,21]. These mainly concern their interpretability [11,23] (in terms of nonlinear scales or measurand factors, as for the Brier score), consistency [19,21] (undermining comparisons and benchmarking) and comprehensiveness [3] (when they do not account for local calibration, that is for levels of calibration in the surroundings of relevant portions of the probability space or bins).…”
Section: Introductionmentioning
confidence: 99%