In this paper, we propose a distributed Newton method for consensus optimization. Our approach outperforms state-of-the-art methods, including ADMM. The key idea is to exploit the sparsity of the dual Hessian and recast the computation of the Newton step as one of efficiently solving symmetric diagonally dominant linear equations. We validate our algorithm both theoretically and empirically. On the theory side, we demonstrate that our algorithm exhibits superlinear convergence within a neighborhood of optimality. Empirically, we show the superiority of this new method on a variety of machine learning problems. The proposed approach is scalable to very large problems and has a low communication overhead.Generally, there are two popular classes of algorithms for distributed optimization. The first is sub-gradient based, while the second relies on a decomposition-coordination procedure. Sub-gradient algorithms proceed by taking a gradient related step then followed by an averaging with neighbors at each iteration. The computation of each step is relatively cheap and can be implemented in a distributed fashion [1]. Though cheap to compute, the best known convergence rate of sub-gradient methods is relatively slow given by O 1 √ t with t being the total number of iterations [2,18]. The second class of algorithms solve constrained problems by relying on dual methods. One of the well-know methods (state-of-the-art) from this class is the Alternating Direction Method of Multipliers (ADMM) [3]. ADMM decomposes the original problem to two subproblems which are then solved sequentially leading to updates of dual variables. In [2], the authors show that ADMM can be fully distributed over a network leading to improved convergence rates in the order of O 1 t . Apart from accuracy problems inherent to ADMM-based methods [4], much rate improvements can be gained from adopting second-order (Newton) methods. Though a variety of techniques have been proposed [7,6,5], less progress has been made at leveraging ADMM's accuracy and convergence rate issues. In a recent attempt [9,10], the authors propose a distributed second-order method for general consensus by using the approach in [8] to compute the Newton direction. As detailed in Section 6, this method suffers from two problems. First, it fails to outperform ADMM and second, faces storage and computational defficiencies for large data sets, thus ADMM retains state-of-the-art status 1 . Contributions:In this paper, we contribute to the above problems and propose a distributed Newton method for general consensus with the following characteristics: i) approximating the exact newton direction up-to any arbitrary > 0, ii) exhibiting super-linear convergence within a neighborhood of the optimal solution similar to exact Newton, and iii) outperforming ADMM and others in terms of iteration count, running times, and total message complexity on a set of benchmark datasets, including one on a real-world application of fMRI imaging. One can argue that our improvements arrive at increased commu...
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO’s empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multiobjective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.