2020
DOI: 10.48550/arxiv.2002.09437
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Calibrating Deep Neural Networks using Focal Loss

Abstract: Miscalibration -a mismatch between a model's confidence and its correctness -of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss Lin et al. [2017] allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analys… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(29 citation statements)
references
References 14 publications
0
29
0
Order By: Relevance
“…As explained earlier, this problem occurs due to two main reasons: overfitting in the final softmax layer and subdued tail-class activations. The overfitting and the resulting miscalibration in softmax probabilities is a known phenomenon observed in many modern multi-class classification networks [12,20]. This has been attributed to prolonged training to minimize the negative log-likelihood loss on a network with a large capacity.…”
Section: Our Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…As explained earlier, this problem occurs due to two main reasons: overfitting in the final softmax layer and subdued tail-class activations. The overfitting and the resulting miscalibration in softmax probabilities is a known phenomenon observed in many modern multi-class classification networks [12,20]. This has been attributed to prolonged training to minimize the negative log-likelihood loss on a network with a large capacity.…”
Section: Our Methodsmentioning
confidence: 99%
“…To mitigate these problems, the overfit softmax layer of the model f θ is replaced with a structurally similar newly-initialized layer (recalibration layer). The intuition behind replacing only the last layer is based on the idea that overfitting and miscalibration are mainly attributed to weight magnification particularly in the last layer of the neural network [20]. The new layer is trained with early stopping and focal loss which help solve the problem of overfitting and subdued activations, respectively.…”
Section: Our Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…However, this raises a new concern, to what extend the network's predictions are likely to be correct? As these deep network tries to reduce the negative log-likelihood loss, they overfit to datasets, rendering its predictions to be over-confident and less trustworthy (Mukhoti et al, 2020). Here, the network is termed to be poorly calibrated.…”
Section: Introductionmentioning
confidence: 99%