GTAdam: Gradient Tracking With Adaptive Momentum for Distributed Online Optimization

Carnevale, Guido; Farina, Francesco; Notarnicola, Ivano; Notarstefano, Giuseppe

doi:10.1109/tcns.2022.3232519

Cited by 9 publications

(6 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A distributed adaptive gradient algorithm with bounded stepsizes was further studied in [31] to improve the generalization capacity. By introducing a GT estimator, a novel distributed adaptive algorithm was developed in the notable work [38], which is proved to achieve a linear convergence rate under the strongly-convex setting.…”

Section: Introductionmentioning

confidence: 99%

“…The adaptive stepsizes are generated according to the historical gradients, enabling the algorithm to automatically coordinate the stepsizes among dimensions when the gradients are sparse. Inspired by [38], we utilize a GT estimator to aggregate the gradients over the network. Moreover, an clipping operator is used to mitigate the negative effects of extreme adaptive stepsizes.…”

Section: Introductionmentioning

confidence: 99%

“…Compared to the GT-based distributed adaptive gradient algorithm for a strongly-convex problem in [38], our work aims to investigate the stationarity performance of the proposed Algorithm 1 for a more challenging distributed stochastic non-convex optimization problem. A notable characteristic of Algorithm 1 is the utilization of a clipping operator as in (6), which distinguishes our algorithm from the algorithm in [38] with an additive bias term. Specifically, in [38], an easily implemented additive bias term ϵ is employed on each v t,i to ensure that the adaptive vector v t,i + ϵ is consistently greater than or equal to the constant ϵ.…”

mentioning

confidence: 99%

“…A notable characteristic of Algorithm 1 is the utilization of a clipping operator as in (6), which distinguishes our algorithm from the algorithm in [38] with an additive bias term. Specifically, in [38], an easily implemented additive bias term ϵ is employed on each v t,i to ensure that the adaptive vector v t,i + ϵ is consistently greater than or equal to the constant ϵ. In contrast, our clipping operator provides a more direct approach to mitigate the negative impact of extreme stepsizes, since the adaptive vector will be clipped only if v t,i is smaller or larger than the threshold v min or v max , respectively.…”

mentioning

confidence: 99%

“…Furthermore, we provide a rigorous stationarity analysis for our Algorithm 1, which establishes an explicit upper bound on the optimality gap, as demonstrated in Corollary 1 below. It is important to note that our analysis approach can also be extended and applied to the algorithm in [38] by adjusting the coefficients v min , v max in our analysis according to the value of additive bias term ϵ in [38].…”

mentioning

confidence: 99%

See 4 more Smart Citations

Distributed Stochastic Gradient Tracking Algorithm With Variance Reduction for Non-Convex Optimization

Xia

Zeng

Sun

et al. 2023

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

This paper considers a distributed stochastic nonconvex optimization problem, where the nodes in a network cooperatively minimize a sum of L-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes according to the historical (possibly sparse) gradients, a distributed adaptive gradient algorithm is proposed, in which a gradient tracking estimator is used to handle the heterogeneity between different local cost functions. We establish an upper bound on the optimality gap, which indicates that our proposed algorithm can reach a first-order stationary solution dependent on the upper bound on the variance of the stochastic gradients. Finally, numerical examples are presented to illustrate the effectiveness of the algorithm.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%