Private Weighted Random Walk Stochastic Gradient Descent

Ayache, Ghadir; Rouayheb, Salim El

doi:10.1109/jsait.2021.3052975

Cited by 13 publications

(9 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [13], the authors studied the convergence of random walks learning for the alternating direction method of multipliers (ADMM). In [21], the paper proposes to improve the convergence guarantees by designing a weighted random walk that accounts for the importance of the local data to speed up convergence. An asymptotic fundamental bound on the convergence rate of these algorithms was proven by [22] and it approaches O(1/ √ k) under convexity and bounded-gradient assumptions.…”

Section: Prior Workmentioning

confidence: 99%

“…The goal of a Gossip algorithm is to ensure that all nodes, and not just the PS, learn the global model and assume convergence once a consensus is reached. Hence, it is less efficient in terms of computations and communication costs [21].…”

Section: Prior Workmentioning

confidence: 99%

“…Assumption 3 (Bounded Gradient). There exits a constant D such that, ∀i ∈ [V ] and ∀w ∈ W, we have ∇F i (w) 2 2 ≤ D. The last assumption is actually a result that follows from the functions F i 's being convex on a closed bounded subset W ⊂ R. A complementary proof can be found in [21].…”

Section: E Model Updatementioning

confidence: 99%

“…In order to prove Lemma 4 and Theorem 1, we present some technical results that we use in the proof. The proof techniques are essentially inspired by the work of [21] and theya are adapted to the assumptions and setting of this work.…”

Section: Appendixmentioning

confidence: 99%

See 3 more Smart Citations

Walk for Learning: A Random Walk Approach for Federated Learning from Heterogeneous Data

Ayache¹,

Dassari²,

Rouayheb³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to allow each node to do several local updates on the model in each iteration before sending it back to the PS. While this mitigates the communication bottleneck, the statistical heterogeneity of the data owned by the different nodes has proven to delay convergence and bias the model.In this work, we study random walk (RW) learning algorithms for tackling the communication and data heterogeneity problems. The main idea is to leverage available direct connections among the nodes themselves, which are typically "cheaper" than the communication to the PS. In a random walk, the model is thought of as a "baton" that is passed from a node to one of its neighbors after being updated in each iteration.The challenge in designing the RW is the data heterogeneity and the uncertainty about the data distributions. Ideally, we would want to visit more often nodes that hold more informative data. We cast this problem as a sleeping multiarmed bandit (MAB) to design near-optimal node sampling strategy that achieves a variance reduced gradient estimates and approaches sub-linearly the optimal sampling strategy. Based on this framework, we present an adaptive random walk learning algorithm. We provide theoretical guarantees on its convergence. Our numerical results validate our theoretical findings and show that our algorithm outperforms existing random walk algorithms.

show abstract

Section: Prior Workmentioning

confidence: 99%

Section: Prior Workmentioning

confidence: 99%

Section: E Model Updatementioning

confidence: 99%

Section: Appendixmentioning

confidence: 99%

See 2 more Smart Citations

Walk for Learning: A Random Walk Approach for Federated Learning from Heterogeneous Data

Ayache¹,

Dassari²,

Rouayheb³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Markov chain naturally appears in many important problems, such as decentralized consensus optimization, which finds applications in various areas including wireless sensor networks, smart grid implementations and distributed statistical learning [4,14,23,48,50,58,61,63] as well as pairwise learning [78] which instantiates AUC maximization [1,29,46,81,87] and metric learning [35,75,76,79]. A common example is a distributed system in which each node stores a subset of the whole data, and one aims to train a global model based on these data.…”

Section: Introductionmentioning

confidence: 99%

Stability and Generalization for Markov Chain Stochastic Gradient Methods

Wang¹,

Lei²,

Ying³

2022

Preprint

View full text Add to dashboard Cite

Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability [38]. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.

show abstract