2021
DOI: 10.48550/arxiv.2112.03657
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

Abstract: Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…considers classification problem where the inputs are Gaussian, and the labels are generated according to a logistic link function, and derives a formula for the asymptotic prediction error of the max-margin classifier, in a setting where the ratio of the dimension and the sample size converges to some fixed positive limit. Other works studying benign overfitting and classification include Liang and Recht [2021], McRae et al [2021], Poggio and Liao [2019], Thrampoulidis [2020], Hu et al [2021].…”
Section: Related Workmentioning
confidence: 99%
“…considers classification problem where the inputs are Gaussian, and the labels are generated according to a logistic link function, and derives a formula for the asymptotic prediction error of the max-margin classifier, in a setting where the ratio of the dimension and the sample size converges to some fixed positive limit. Other works studying benign overfitting and classification include Liang and Recht [2021], McRae et al [2021], Poggio and Liao [2019], Thrampoulidis [2020], Hu et al [2021].…”
Section: Related Workmentioning
confidence: 99%
“…Other theoretical works on convergence rate of DNN classifiers are carried out under other assumptions, e.g., separable (Zhang, 2000;Hu et al, 2021), teacher student setting (Hu et al, 2020), smooth conditional probability (Audibert et al, 2007;Steinwart et al, 2007;, etc. Classification by estimating the conditional probability is usually referred to as "plug-in" classifiers and it's worth noting that it essentially reduces classification to regression.…”
Section: Dnn In Classificationmentioning
confidence: 99%
“…It is interesting to explore whether minimax optimal DNN approaches with dimension-free properties can be established in the classification setting. Existing literature on this front is limited, and most works either treat classification as regression by estimating the conditional class probability instead of the decision boundary Kohler and Langer, 2020;Bos and Schmidt-Hieber, 2021;Hu et al, 2021;Wang et al, 2022b,a;Wang and Shang, 2022) or settle for an upper bound on the misclassification risk (Kim et al, 2021;Steinwart et al, 2007;Hamm and Steinwart, 2020). Unlike regression problems where one intends to estimate the unknown regression functions, the goal of classification is to recover the unknown decision boundary separating different classes.…”
Section: Introductionmentioning
confidence: 99%
“…Regarding results for classification, [19] show convergence rates considering the misclassification error in a noiseless setting. Consistency results which include condition (1.1) in the assumptions are given by [3,13,8]. In contrast to our approach, the previously mentioned articles attempt to estimate the regression function f Q instead of directly estimating the set G * Q .…”
Section: Introductionmentioning
confidence: 99%