A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning

Wai, Hoi-To; Chang, Tsung-Hui; Scaglione, Anna

doi:10.1109/icassp.2015.7178631

Cited by 26 publications

(14 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our results can be contrasted with [17]- [21] wherein gradient schemes tailored with consensus/diffusion updates are employed for some instance of (P1). The aforementioned schemes do not have any convergence guarantees: it is postulated that the sequence generated by the algorithms is convergent (see, e.g., [20], [21]), and then concluded that any limit point is a stationary solution of the problem. Furthermore, some of these schemes do not even achieve consensus among the local variables.…”

Section: B Discussionmentioning

confidence: 99%

Distributed dictionary learning

Daneshmand

Facchinei

2016

2016 50th Asilomar Conference on Signals, Systems and Computers

View full text Add to dashboard Cite

The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all data in a fusion center, due to resource limitations, communication overhead or privacy considerations. We develop a general distributed algorithmic framework for the (nonconvex) DL problem and establish its asymptotic convergence. The new method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a gradient tracking mechanism instrumental to locally estimate the missing global information; and ii) a consensus step, as a mechanism to distribute the computations among the agents. To the best of our knowledge, this is the first distributed algorithm with provable convergence for the DL problem and, more in general, bi-convex optimization problems over (time-varying) directed graphs

show abstract

Section: B Discussionmentioning

confidence: 99%

Distributed dictionary learning

Daneshmand

Facchinei

2016

2016 50th Asilomar Conference on Signals, Systems and Computers

View full text Add to dashboard Cite

show abstract

“…It follows that using an algorithm in A or A , it takes at least M/3 iterations for the non-zero x r [2] and the corresponding gradient vector to propagate to at least one node in set [2M/3 + 1, M ]. Once we have x j [2] = 0 for some j ∈ [2M/3 + 1, M ], then according to (57), it is possible to have ∂h j (x j ) ∂x j [3] = 0, and once this gradient becomes non-zero, the corresponding variable…”

Section: Lemma 32mentioning

confidence: 99%

“…The problem (1) and (2) have been studied extensively in the literature when f i 's are all convex; see for example [4][5][6]. Primal based methods such as distributed subgradient (DSG) method [4], the EXTRA method [6], as well as primal-dual based methods such as distributed augmented Lagrangian method [7], Alternating Direction Method of Multipliers (ADMM) [8,9] have been proposed.On the contrary, only recently there have been works addressing the more challenging problems without assuming convexity of f i ; see [1,3,[10][11][12][13][14][15][16][17][18][19][20][21][22][23]. The convergence behavior of the distributed consensus problem (1) has been studied in [3,10,11].…”

mentioning

confidence: 99%

“…On the contrary, only recently there have been works addressing the more challenging problems without assuming convexity of f i ; see [1,3,[10][11][12][13][14][15][16][17][18][19][20][21][22][23]. The convergence behavior of the distributed consensus problem (1) has been studied in [3,10,11]. Reference [12] develops a non-convex ADMM based methods for solving the distributed consensus problem (1).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Distributed Non-Convex First-Order optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

Sun

Mei

2018

2018 52nd Asilomar Conference on Signals, Systems, and Computers

View full text Add to dashboard Cite

We consider a class of popular distributed non-convex optimization problems, in which agents connected by a network G collectively optimize a sum of smooth (possibly non-convex) local objective functions. We address the following question: if the agents can only access the gradients of local functions, what are the fastest rates that any distributed algorithms can achieve, and how to achieve those rates.First, we show that there exist difficult problem instances, such that it takes a class of distributed first-order methods at least O(1/ ξ(G) ×L/ ) communication rounds to achieve certain -solution [where ξ(G) denotes the spectral gap of the graph Laplacian matrix, andL is some Lipschitz constant]. Second, we propose (near) optimal methods whose rates match the developed lower rate bound (up to a ploylog factor). The key in the algorithm design is to properly embed the classical polynomial filtering techniques into modern first-order algorithms. To the best of our knowledge, this is the first time that lower rate bounds and optimal methods have been developed for distributed non-convex optimization problems.A common way to reformulate problem (1) in the distributed setting is given below. Introduce M local variables x 1 , · · · , x M ∈ R S and a concatenation of M variables x := [x 1 ; · · · ; x M ] ∈ R SM ×1 , and suppose the graph {V, E} is connected, then the following formulation is equivalent to the global consensus problemThe main benefit of the above formulation is that the objective function is now separable, and the linear constraint encodes the network connectivity pattern. 1.2 Distributed non-convex optimization Distributed non-convex optimization has gained considerable attention recently. For example, it finds applications in training neural networks [1], clustering [2], and dictionary learning [3], just to name a few. The problem (1) and (2) have been studied extensively in the literature when f i 's are all convex; see for example [4][5][6]. Primal based methods such as distributed subgradient (DSG) method [4], the EXTRA method [6], as well as primal-dual based methods such as distributed augmented Lagrangian method [7], Alternating Direction Method of Multipliers (ADMM) [8,9] have been proposed.On the contrary, only recently there have been works addressing the more challenging problems without assuming convexity of f i ; see [1,3,[10][11][12][13][14][15][16][17][18][19][20][21][22][23]. The convergence behavior of the distributed consensus problem (1) has been studied in [3,10,11]. Reference [12] develops a non-convex ADMM based methods for solving the distributed consensus problem (1). However the network considered therein is a star network in which the local nodes are all connected to a central controller. References [14,15] propose a primal-dual based method for unconstrained problem over a connected network, and derives a global convergence rate for this setting. In [13,17,18], the authors utilize certain gradient tracking idea to solve a constrained nonsmooth distributed problem over possibl...

show abstract

“…In the second one (and also in [12] where a more extended version can be found), a distributed version of the centralized K-SVD algorithm is proposed, while convergence analysis of the algorithm is presented in [13]. Additionally, in [14], a consensus-based, distributed algorithm for general inference / learning problems is proposed which can also be applied for the problem of dictionary learning. An online algorithm has appeared in [15], where the recursive least squares algorithm is employed.…”

Section: Introductionmentioning

confidence: 99%

Distributed dictionary learning via projections onto convex sets

Ampeliotis

Mavrokefalidis

Berberidis

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-We study a problem in which the nodes of a network, each with different data, are interested in computing a common dictionary that is suitable for the efficient sparse coding of all their data. To this end, distributed processing is employed, that is, the nodes merge local and neighboring information. We formulate this as a convex feasibility problem, and propose a suitable distributed algorithm for obtaining a solution that employs projections onto convex sets. A fast method for computing the involved projection operations is also given. The proposed approach allows the associated convex sets to be updated at every iteration of the algorithm, thus resulting into a faster agreement of the nodes in a common dictionary. Simulation results are provided that demonstrate the effectiveness of the proposed scheme in computing a common dictionary, in a scenario where the data of the nodes are significantly different and a second scenario, in which the nodes have the same data.

show abstract

A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning

Cited by 26 publications

References 25 publications

Distributed dictionary learning

Distributed dictionary learning

Distributed Non-Convex First-Order optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

Distributed dictionary learning via projections onto convex sets

Contact Info

Product

Resources

About