We propose an efficient algorithm for finding first-order Nash equilibria in smooth min-max problems of the form min x∈X max y∈Y F (x, y), where the objective function is nonconvex with respect to x and concave with respect to y, and the set Y is convex, compact, and projectionfriendly. The goal is to reach an (ε x , ε y )-first-order Nash equilibrium point, as measured by the norm of the corresponding (proximal) gradient component. The proposed approach is fairly simple: essentially, we perform approximate proximal point iterations on the primal function, with inexact oracle provided by Nesterov's algorithm run on the regularized function F (x t , •) with O(ε y ) regularization term, where x t is the current primal iterate. The resulting iteration complexity is O(ε −2 x ε −1/2 y
The classical asymptotic theory for parametric M -estimators guarantees that, in the limit of infinite sample size, the excess risk has a chi-square type distribution, even in the misspecified case. We demonstrate how self-concordance of the loss allows to characterize the critical sample size sufficient to guarantee a chi-square type in-probability bound for the excess risk. Specifically, we consider two classes of losses: (i) self-concordant losses in the classical sense of Nesterov and Nemirovski, i.e., whose third derivative is uniformly bounded with the 3/2 power of the second derivative; (ii) pseudo self-concordant losses, for which the power is removed. These classes contain losses corresponding to several generalized linear models, including the logistic loss and pseudo-Huber losses.Our basic result under minimal assumptions bounds the critical sample size by O(d • d eff ), where d the parameter dimension and d eff the effective dimension that accounts for model misspecification. In contrast to the existing results, we only impose local assumptions that concern the population risk minimizer θ * . Namely, we assume that the calibrated predictors, i.e., predictors scaled by the square root of the second derivative of the loss, is subgaussian at θ * . Besides, for type-ii losses we require boundedness of certain measure of curvature of the population risk at θ * .Our improved result bounds the critical sample size from above as O(max{d eff , d log d})under slightly stronger assumptions. Namely, the local assumptions must hold in the neighborhood of θ * given by the Dikin ellipsoid of the population risk. Interestingly, we find that, for logistic regression with Gaussian design, there is no actual restriction of conditions: the subgaussian parameter and curvature measure remain near-constant over the Dikin ellipsoid. Finally, we extend some of these results to 1 -penalized estimators in high dimensions.
We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as O(1/ √ n) from n observations, we assume that the individual losses are self-concordant, that is, their third-order derivatives are bounded by their secondorder derivatives. This setting includes least-squares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a bias-variance decomposition and show that the assumptions commonly made in least-squares regression, such as the source and capacity conditions, can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.
Let θ 0 , θ 1 ∈ R d be the population risk minimizers associated to some loss : R d × Z → R and two distributions P 0 , P 1 on Z. Our work is motivated by the following question: Given i.i.d. samples from P 0 and P 1 , what sample sizes are sufficient and necessary to distinguish between the two hypotheses θ * = θ 0 and θ * = θ 1 for given θ * ∈ {θ 0 , θ 1 }?Making the first steps towards answering this question in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide matching upper and lower bounds on the sample complexity, showing it to be min{1/∆ 2 , √ r/∆} up to a constant factor, where ∆ is a measure of separation between P 0 and P 1 , and r is the rank of the design covariance matrix. This bound is dimension-independent, and rank-independent for large enough separation. We then extend this result in two directions: (i) for the general parametric setup in asymptotic regime; (ii) for generalized linear models in the small-sample regime n r and under weak moment assumptions. In both cases, we derive sample complexity bounds of a similar form, even under misspecification. In fact, our testing procedures only access θ * through a certain functional of empirical risk. In addition, the number of observations that allows to reach statistical confidence in our tests does not allow to "resolve" the two models -that is, recover θ 0 , θ 1 up to O(∆) prediction accuracy. These two properties allow to use our framework in applied tasks where one would like to identify a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually inferred by the agent performing identification. * Equal contribution of the first two authors. 1 For this expository discussion, we define the sample complexity of a (binary) testing problem as the size of an i.i.d. sample for which there exists a test with testing errors of both types at most 0.05.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.