Motivated by the training of Generative Adversarial Networks (GANs), we study methods for solving minimax problems with additional nonsmooth regularizers. We do so by employing monotone operator theory, in particular the Forward-Backward-Forward (FBF) method, which avoids the known issue of limit cycling by correcting each update by a second gradient evaluation. Furthermore, we propose a seemingly new scheme which recycles old gradients to mitigate the additional computational cost. In doing so we rediscover a known method, related to Optimistic Gradient Descent Ascent (OGDA). For both schemes we prove novel convergence rates for convex-concave minimax problems via a unifying approach. The derived error bounds are in terms of the gap function for the ergodic iterates. For the deterministic and the stochastic problem we show a convergence rate of O( 1 /k) and O( 1 / √ k), respectively. We complement our theoretical results with empirical improvements in the training of Wasserstein GANs on the CIFAR10 dataset.Preprint. Under review.
In this work we aim to solve a convex-concave saddle point problem, where the convex-concave coupling function is smooth in one variable and nonsmooth in the other and not assumed to be linear in either. The problem is augmented by a nonsmooth regulariser in the smooth component. We propose and investigate a novel algorithm under the name of OGAProx, consisting of an optimistic gradient ascent step in the smooth variable coupled with a proximal step of the regulariser, and which is alternated with a proximal step in the nonsmooth component of the coupling function. We consider the situations convex-concave, convex-strongly concave and strongly convex-strongly concave related to the saddle point problem under investigation. Regarding iterates we obtain (weak) convergence, a convergence rate of order $$\mathcal {O}(\frac{1}{K})$$ O ( 1 K ) and linear convergence like $$\mathcal {O}(\theta ^{K})$$ O ( θ K ) with $$\theta < 1$$ θ < 1 , respectively. In terms of function values we obtain ergodic convergence rates of order $$\mathcal {O}(\frac{1}{K})$$ O ( 1 K ) , $$\mathcal {O}(\frac{1}{K^{2}})$$ O ( 1 K 2 ) and $$\mathcal {O}(\theta ^{K})$$ O ( θ K ) with $$\theta < 1$$ θ < 1 , respectively. We validate our theoretical considerations on a nonsmooth-linear saddle point problem, the training of multi kernel support vector machines and a classification problem incorporating minimax group fairness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.