2017
DOI: 10.4467/20838476si.16.004.6185
|View full text |Cite
|
Sign up to set email alerts
|

On Loss Functions for Deep Neural Networks in Classification

Abstract: Abstract. Deep neural networks are currently among the most commonly used classifiers. Despite easily achieving very good performance, one of the best selling points of these models is their modular design -one can conveniently adapt their architecture to specific needs, change connectivity patterns, attach specialised layers, experiment with a large amount of activation functions, normalisation schemes and many others. While one can find impressively wide spread of various configurations of almost every aspec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
252
0
7

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 436 publications
(264 citation statements)
references
References 6 publications
5
252
0
7
Order By: Relevance
“…Finally, instead of | p − q |, we use (p − q) 2 . This choice is inspired by the results of [33] and will be justified empirically in Section III. Figure 4(b) shows a plot of Loss DT versus exact HD for the same prostate and brain MR data as in 4(a).…”
Section: B Estimation Of the Hausdorff Distance Based On Distance Trmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, instead of | p − q |, we use (p − q) 2 . This choice is inspired by the results of [33] and will be justified empirically in Section III. Figure 4(b) shows a plot of Loss DT versus exact HD for the same prostate and brain MR data as in 4(a).…”
Section: B Estimation Of the Hausdorff Distance Based On Distance Trmentioning
confidence: 99%
“…These studies have shown that the choice of the loss function can have a large impact on the performance of image segmentation methods. Recently, some studies have argued that the choice of a good loss function for training of deep learning models has been unfairly neglected and that research on this topic can lead to large improvements in the performance of these models [33].…”
Section: Introductionmentioning
confidence: 99%
“…A typical choice for regression problems is the Gaussian loss function l(yi,FW,bxi)=yiFW,bxi2, then we have a traditional least‐squares problem and ϕ ( W )=λ|| W || 2 .…”
Section: Deep Learningmentioning
confidence: 99%
“…Dropout, the technique of removing input dimensions in x randomly with probability p, can also be used to further reduce the change of overfitting during the training process. 36 A typical choice for regression problems is the Gaussian loss function l(y i , F W,b (x i )) = ||y i − F W,b (x i ) || 2 , then we have a traditional least-squares problem 37 and (W) = ||W|| 2 .…”
Section: Stochastic Gradient Descentmentioning
confidence: 99%
“…Surrogate loss functions for the 0-1 loss have been widely of interest to the machine learning and statistics communities [1][2][3][4][9][10][11][12][13][14]. Recently, there has been renewed interest in alternative losses for classification [7,10,11,14,15] other than the oft-used log-loss. During the inception of the field, convex losses were widely considered optimal [1,3,4,9].…”
Section: Related Workmentioning
confidence: 99%