W. Bastiaan Kleijn scite author profile

W. Bastiaan Kleijn

4Publications

0Citation Statements Received

114Citation Statements Given

How they've been cited

How they cite others

114

Affiliations

Victoria University of Wellington

Publications

Order By: Most citations

Revisiting the Primal-Dual Method of Multipliers for Optimisation over Centralised Networks

Zhang¹,

Niwa²,

Kleijn³

2021

Preprint

View full text Add to dashboard Cite

The primal-dual method of multipliers (PDMM) was originally designed for solving a decomposable optimisation problem over a general network. In this paper, we revisit PDMM for optimisation over a centralized network. We first note that the recently proposed method FedSplit [1] implements PDMM for a centralized network. In [1], Inexact FedSplit (i.e., gradient based FedSplit) was also studied both empirically and theoretically. We identify the cause for the poor reported performance of Inexact FedSplit, which is due to the improper initialisation in the gradient operations at the client side. To fix the issue of Inexact FedSplit, we propose two versions of Inexact PDMM, which are referred to as gradient-based PDMM (GPDMM) and accelerated GPDMM (AGPDMM), respectively. AGPDMM accelerates GPDMM at the cost of transmitting two times the number of parameters from the server to each client per iteration compared to GPDMM. We provide a new convergence bound for GPDMM for a class of convex optimisation problems. Our new bounds are tighter than those derived for Inexact FedSplit. We also investigate the update expressions of AGPDMM and SCAFFOLD to find their similarities. It is found that when the number K of gradient steps at the client side per iteration is K = 1, both AGPDMM and SCAFFOLD reduce to vanilla gradient descent with proper parameter setup. Experimental results indicate that AGPDMM converges faster than SCAFFOLD when K > 1 while GPDMM converges slightly worse than SCAFFOLD.

show abstract

On eigenvalue shaping for two-channel decentralized active noise control systems

et al. 2023

View full text Add to dashboard Cite

A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range

Zhang¹,

Niwa²,

Kleijn³

2022

Preprint

View full text Add to dashboard Cite

Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g 2 t and the squared prediction error (mt −gt) 2 , respectively, where mt is the first momentum at iteration t and can be viewed as a prediction of gt. In this work, we attempt to find out if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre or post processing. Empirical study indicates that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10, suggesting that layer-wise gradient statistics plays an important role towards the success of Adam and AdaBelief for at least certain DNN tasks. In the second step, instead of manual setup of layer-wise stepsizes, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly small statistic variances. Motivated by the fact that (mt − gt) 2 in AdaBelief is conservative in comparison to g 2 t in Adam in terms of layerwise statistic averages and variances, Aida is designed by tracking a more conservative function of mt and gt than (mt − gt) 2 in AdaBelief via layerwise orthogonal vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks.

show abstract

Rapidly Adapting Moment Estimation

Zhang¹,

Niwa²,

Kleijn³

2019

Preprint

View full text Add to dashboard Cite

Adaptive gradient methods such as Adam have been shown to be very effective for training deep neural networks (DNNs) by tracking the second moment of gradients to compute the individual learning rates. Differently from existing methods, we make use of the most recent first moment of gradients to compute the individual learning rates per iteration. The motivation behind it is that the dynamic variation of the first moment of gradients may provide useful information to obtain the learning rates. We refer to the new method as the rapidly adapting moment estimation (RAME). The theoretical convergence of deterministic RAME is studied by using an analysis similar to the one used in [1] for Adam. Experimental results for training a number of DNNs show promising performance of RAME w.r.t. the convergence speed and generalization performance compared to the stochastic heavy-ball (SHB) method, Adam, and RMSprop.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.