We consider the problem of finding an approximate second-order stationary point of a constrained non-convex optimization problem. We first show that, unlike the unconstrained scenario, the vanilla projected gradient descent algorithm may converge to a strict saddle point even when there is only a single linear constraint. We then provide a hardness result by showing that checking ( g , H )-second order stationarity is NP-hard even in the presence of linear constraints. Despite our hardness result, we identify instances of the problem for which checking second order stationarity can be done efficiently. For such instances, we propose a dynamic second order Frank-Wolfe algorithm which converges to ( g , H )-second order stationary points in O(max{ −2 g , −3 H }) iterations. The proposed algorithm can be used in general constrained non-convex optimization as long as the constrained quadratic subproblem can be solved efficiently.
With the increasing interest in deeper understanding of the loss surface of many non-convex deep models, this paper presents a unifying framework to establish the local/global optima equivalence of the optimization problems arising from training of such non-convex models. Using the local openness property of the underlying training models, we provide simple sufficient conditions under which any local optimum of the resulting optimization problem is globally optimal. We first completely characterize the local openness of the symmetric and non-symmetric matrix multiplication mapping in its range. Then we use our characterization to: 1) provide a simple proof for the classical result of Burer-Monteiro and extend it to non-continuous loss functions. 2) show that every local optimum of two layer linear networks is globally optimal. Unlike many existing results in the literature, our result requires no assumption on the target data matrix Y , and input data matrix X. 3) Develop almost complete characterization of the local/global optima equivalence of multi-layer linear neural networks. We provide various counterexamples to show the necessity of each of our assumptions. 4) Show global/local optima equivalence of non-linear deep models having certain pyramidal structure. Unlike some existing works, our result requires no assumption on the differentiability of the activation functions and can go beyond "full-rank" cases.
With the increasing interest in applying the methodology of difference-of-convex (dc) optimization to diverse problems in engineering and statistics, this paper establishes the dc property of many functions in various areas of applications not previously known to be of this class. Motivated by a quadratic programming based recourse function in two-stage stochastic programming, we show that the (optimal) value function of a copositive (thus not necessarily convex) quadratic program is dc on the domain of finiteness of the program when the matrix in the objective function's quadratic term and the constraint matrix are fixed. The proof of this result is based on a dc decomposition of a piecewise LC 1 function (i.e., functions with Lipschitz gradients). Armed with these new results and known properties of dc functions existed in the literature, we show that many composite statistical functions in risk analysis, including the value-at-risk (VaR), conditional value-at-risk (CVaR), Optimized Certainty Equivalent (OCE), and the expectation-based, VaR-based, and CVaR-based random deviation functionals are all dc. Adding the known class of dc surrogate sparsity functions that are employed as approximations of the 0 function in statistical learning, our work significantly expands the classes of dc functions and positions them for fruitful applications. * All authors are affiliated with the Daniel J. Epstein
In this paper we propose GIFAIR-FL: an approach that imposes group and individual fairness to federated learning settings. By adding a regularization term, our algorithm penalizes the spread in the loss of client groups to drive the optimizer to fair solutions. Theoretically, we show convergence in non-convex and strongly convex settings. Our convergence guarantees hold for both i.i.d. and non-i.i.d. data. To demonstrate the empirical performance of our algorithm, we apply our method on image classification and text prediction tasks. Compared to existing algorithms, our method shows improved fairness results while retaining superior or similar prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.