Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing 2018
DOI: 10.1145/3212734.3212763
|View full text |Cite
|
Sign up to set email alerts
|

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Abstract: Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from execution in a distributed environment. However, surprisingly, the convergence properties of this classic algorithm i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
67
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 52 publications
(69 citation statements)
references
References 24 publications
2
67
0
Order By: Relevance
“…For example, if both threads sample the same θ and additionally sample minibatches that are 100% biased towards the same label, the gradients derived by both threads will be maximally similar. 6 We show evidence to support this claim in the evaluation. Specifically, we show that as minibatch bias increases, less attack threads are required to move the model out of its converged state.…”
Section: Challenge: Crafting Constructive Gradient Updatesmentioning
confidence: 57%
See 2 more Smart Citations
“…For example, if both threads sample the same θ and additionally sample minibatches that are 100% biased towards the same label, the gradients derived by both threads will be maximally similar. 6 We show evidence to support this claim in the evaluation. Specifically, we show that as minibatch bias increases, less attack threads are required to move the model out of its converged state.…”
Section: Challenge: Crafting Constructive Gradient Updatesmentioning
confidence: 57%
“…In the A-SGD literature, the closest work to that presented in our paper is [6]. That work discusses how an adversary can slow convergence by influencing scheduling.…”
Section: Related Workmentioning
confidence: 97%
See 1 more Smart Citation
“…The results in Theorem 6 and Corollary 3 are related to the results presented in [10] and [4]. The main differences are that in our analysis we tighten the bound with a factor (2 − θ) −1 , expand the allowed step size interval, as well as relax the maximum staleness assumption and reduce the magnitude of the bound from linear in the maximum staleness O(τ ) to the expected O(τ ).…”
Section: Convex Convergence Analysismentioning
confidence: 58%
“…Properties of Async-PSGD with sparse or componentwise updates have since been rigourously studied in recent literature due to the performance benefits of lockfreedom [28][24] [10]. The gradient sparsity assumption was relaxed in the recent work [4] which magnified the convergence time bound in the order of magnitude ∼ √ d, d being the problem dimensionality. Delayed optimization in completely asynchronous first-order optimization algorithms was analyzed initially in [2], where Agarwal et al introduce step sizes which diminish over the progression of SGD, depending on the maximum staleness allowed in the system, but not adaptive to the actual delays observed.…”
Section: Related Workmentioning
confidence: 99%