Understanding Square Loss in Training Overparametrized Neural Network Classifiers

Hu, Tianyang; Wang, Jun; Wang, Wenjia; Li, Zhenguo

doi:10.48550/arxiv.2112.03657

Cited by 3 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…considers classification problem where the inputs are Gaussian, and the labels are generated according to a logistic link function, and derives a formula for the asymptotic prediction error of the max-margin classifier, in a setting where the ratio of the dimension and the sample size converges to some fixed positive limit. Other works studying benign overfitting and classification include Liang and Recht [2021], McRae et al [2021], Poggio and Liao [2019], Thrampoulidis [2020], Hu et al [2021].…”

Section: Related Workmentioning

confidence: 99%

The Implicit Bias of Benign Overfitting

Shamir¹

2022

Preprint

View full text Add to dashboard Cite

The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while attaining low expected loss, has received much attention in recent years, but still remains not fully understood beyond simple linear regression setups. In this paper, we show that for regression, benign overfitting is "biased" towards certain types of problems, in the sense that its existence on one learning problem excludes its existence on other learning problems. On the negative side, we use this to argue that one should not expect benign overfitting to occur in general, for several natural extensions of the plain linear regression problems studied so far. We then turn to classification problems, and show that the situation there is much more favorable. Specifically, we consider a model where an arbitrary input distribution of some fixed dimension k is concatenated with a high-dimensional distribution, and prove that the max-margin predictor (to which gradient-based methods are known to converge in direction) is asymptotically biased towards minimizing the expected squared hinge loss w.r.t. the k-dimensional distribution. This allows us to reduce the question of benign overfitting in classification to the simpler question of whether this loss is a good surrogate for the prediction error, and use it to show benign overfitting in some new settings.

show abstract

Section: Related Workmentioning

confidence: 99%

The Implicit Bias of Benign Overfitting

Shamir¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Other theoretical works on convergence rate of DNN classifiers are carried out under other assumptions, e.g., separable (Zhang, 2000;Hu et al, 2021), teacher student setting (Hu et al, 2020), smooth conditional probability (Audibert et al, 2007;Steinwart et al, 2007;, etc. Classification by estimating the conditional probability is usually referred to as "plug-in" classifiers and it's worth noting that it essentially reduces classification to regression.…”

Section: Dnn In Classificationmentioning

confidence: 99%

“…It is interesting to explore whether minimax optimal DNN approaches with dimension-free properties can be established in the classification setting. Existing literature on this front is limited, and most works either treat classification as regression by estimating the conditional class probability instead of the decision boundary Kohler and Langer, 2020;Bos and Schmidt-Hieber, 2021;Hu et al, 2021;Wang et al, 2022b,a;Wang and Shang, 2022) or settle for an upper bound on the misclassification risk (Kim et al, 2021;Steinwart et al, 2007;Hamm and Steinwart, 2020). Unlike regression problems where one intends to estimate the unknown regression functions, the goal of classification is to recover the unknown decision boundary separating different classes.…”

Section: Introductionmentioning

confidence: 99%

Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary

Hu¹,

Liu²,

Shang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep learning has gained huge empirical successes in large-scale classification problems. In contrast, there is a lack of statistical understanding about deep learning methods, particularly in the minimax optimality perspective. For instance, in the classical smooth decision boundary setting, existing deep neural network (DNN) approaches are rate-suboptimal, and it remains elusive how to construct minimax optimal DNN classifiers. Moreover, it is interesting to explore whether DNN classifiers can circumvent the "curse of dimensionality" in handling highdimensional data. The contributions of this paper are two-fold. First, based on a localized margin framework, we discover the source of suboptimality of existing DNN approaches. Motivated by this, we propose a new deep learning classifier using a divide-and-conquer technique: DNN classifiers are constructed on each local region and then aggregated to a global one. We further propose a localized version of the classical Tsybakov's noise condition, under which statistical optimality of our new classifier is established. Second, we show that DNN classifiers can adapt to low-dimensional data structures and circumvent the "curse of dimensionality" in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension. Numerical experiments are conducted on simulated data to corroborate our theoretical results.

show abstract

“…Regarding results for classification, [19] show convergence rates considering the misclassification error in a noiseless setting. Consistency results which include condition (1.1) in the assumptions are given by [3,13,8]. In contrast to our approach, the previously mentioned articles attempt to estimate the regression function f Q instead of directly estimating the set G * Q .…”

Section: Introductionmentioning

confidence: 99%

Optimal Convergence Rates of Deep Neural Networks in a Classification Setting

Meyer¹

2022

Preprint

View full text Add to dashboard Cite

We establish optimal convergence rates up to a log-factor for a class of deep neural networks in a classification setting under a restraint sometimes referred to as the Tsybakov noise condition. We construct classifiers in a general setting where the boundary of the bayes-rule can be approximated well by neural networks. Corresponding rates of convergence are proven with respect to the misclassification error. It is then shown that these rates are optimal in the minimax sense if the boundary satisfies a smoothness condition. Non-optimal convergence rates already exist for this setting. Our main contribution lies in improving existing rates and showing optimality, which was an open problem. Furthermore, we show almost optimal rates under some additional restraints which circumvent the curse of dimensionality. For our analysis we require a condition which gives new insight on the restraint used. In a sense it acts as a requirement for the "correct noise exponent" for a class of functions.

show abstract

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

Cited by 3 publications

References 26 publications

The Implicit Bias of Benign Overfitting

The Implicit Bias of Benign Overfitting

Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary

Optimal Convergence Rates of Deep Neural Networks in a Classification Setting

Contact Info

Product

Resources

About